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(57) Abstract 

The invention provides modified reverse transcriptase polypeptides (Types I, II, and III), along with polynucleotides encoding such 
polypeptides, vectors containing such polynucleotides and host cells transformed with those polynucleotides. The modified RTs typically 
exhibit improved stability and/or improved solubility, relative to naturally occurring reverse transcriptases. The modified RTs are also found 
in a variety of forms, such as monomers as well as both homo- and hetero-multimers. The modified RTs may be used in any one or more 
of the methods known to benefit from reverse transcriptase activity, such as cDNA synthesis, and amplification techniques such as PCR 
and RAMP. 
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BIOLOGICALLY ACTIVE REVERSE TRANSCRIPTASES 

FIELD OF THE INVENTION 

In general, the invention relates to the field of molecular biology. In particular, the 
invention relates to reverse transcriptases. 

RACKGROI ND OF THE INVENTION 

The defining activity of a reverse transcriptase (RT) is its ability to synthesize a 
cDNA strand using an RNA template. This activity has been exploited in a wide variety 
of techniques fundamental to progress in the academic and commercial arenas. For 
example, reverse transcription is useful in the production of cDNA molecules and libraries, 
sequence-specific probes having a variety of labels, sequencing techniques, and any of 
several amplification techniques. These amplification techniques include Reverse 
Transcnption-Polymerase Chain Reaction (RT-PCR; Myers et al., Biochemistry 
J0.-766 1-7666 (1991) and U.S. Patent Nos. 5.310.652 and 5,407.800), Nucleic Acid 
Sequence-Based Amplification (NASBA; Kievits et al.. J. Virol. Methods J5/273-286 
(1991) and U.S. Patent Nos. 5,130,238 and 5,409,818), Self-Sustained Sequence 
Replication (3SR; Cuatelli et al., Proc. Natl. Acad. Sci. (USA) 57:1874-1878, 1990) and 
Rapid Amplification (RAMP; PCT/US97/04170). Other amplification techniques take 
advantage, at least in part, of the DNA-dependent DNA polymerase activity of some RTs. 
Amplification techniques falling within this category include, e.g.. the Polymerase Chain 
Reaction {i.e., PCR; Saiki et al.. Science 239:487-491 (1989) and U.S. Patent Nos. 
4.683.195, 4.683.202 and 4.800,159). the Inverse Polymerase Chain Reaction, the 
Multiplex Polymerase Chain Reaction. Strand Displacement Amplification (i.e., SDA; 
Walker et al.. Proc. Natl. Acad. Sci. (USA) 89:392-396 (1992). Walker et al.. Nucl. 
Acids Res. 20(7): 1091-1696 (1992), and U.S. Patent Nos. 5.270,184. and 5.455,166). and 
the Multiplex Strand Displacement Amplification (U.S. Patent No. 5.422.252 and 
5,470,723). 

Reverse transcriptases arc found in a variety of retroviruses, or RNA tumor viruses. 
Techniques for producing RT from these native sources involve isolation of virus particles 
which contain about thirty RT molecules per virion. The RT is released from the \ inons 
by lysis of the virion coat. Released native RTs may then be purified using conventional 
techniques. However, the procedure involved in the production of these viruses is labor- 
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intensive and costly (1,000 infected chicks produce 10-20 grams of virus, which is 
approximately 25,000-40.000 units/gram of virus). Additional problems with RT 
production from natural sources are the high natural mutation rates which, in part, result 
m restricted host ranges such as specific strains of chickens. 
5 An alternative source of RTs is recombinant production, which in turn is dependent 

on an understanding of RT expression by the various retroviruses. In general terms, 
retroviruses bind to receptors on susceptible cells and insert the retroviral core particle into 
the cytoplasm of the host. Two major events occur in the life cycle of retroviruses. First, 
the single-stranded RNA genome is converted to double-stranded DNA by reverse 

10 transcriptase. Second, this DNA copy is inserted into the genome of the host cell 
(Varmus, et al., In Mobile DNA (ed. Berg, et al.,) pp 53-108, (1989), Washington D.C.: 
AM. Soc. Microbiol. 972 pp; Brown, Curr. Top. Microbiol. Immunol. 157: 19-48 (1990); 
Goff. Cancer Cells 2: 172-178 (1990a); Goff, J. Acquired Immune Defic. Syndr. 3:817-31 
(1990b); Boeke, et al., Curr. Opin. Cell. Biol. 3: 502-507 (1991), an event typically 

15 mediated by a virally encoded integrase activity. Following integration, this proviral DNA 
can be transcribed by the host RNA polymerase to make viral RNA which is then 
transported back to the cytoplasm for synthesis of various viral proteins. Virus assembly 
takes place in the cytoplasm followed by release of budded viruses from the cell for 
another round of infection (Whiteomb, et al., Ann. Rev. Cell Biol. 8: 275-306 (1992)). 

20 Any defect in the reverse transcription or integrase functions will result in a defective virus 
that cannot replicate. As an example, Avian Myeloblastosis Virus {i.e., AMV) is a 
defective virus that requires a helper virus such as Myeloblastosis-Assoeiated Virus (i.e., 
MAV) for viral propagation. 

Integrase ensures a stable association of viral and host DNAs. Integration is site- 

25 specific with respect to the viral DNA but is essentially random with respect to the host. 
This observation indicates that there is a DNA binding region in the integrase domain that 
is necessary for the binding of viral and host DNAs, in a manner independent of host 
sequence, during the integration process. 

Although encoded by the cognate genes, the integrase domain is not found within 

30 mature MMLV-RT (i.e., Moloney- Murine leukemia Virus Reverse Transcriptase, a Type 
I RT) or mature HIV-RT (i.e.. Human Immunodeficiency Virus Reverse Transcriptase, 
a Type II RT). However, the integrase domain is found as an integral part of the mature 
avian RT (a Type III RT). The presence of this integral integrase domain, along with 



lliermosiabilm . are two features of avian RTs that distinguish this class of RT from other 
RTs. Investigations of the integrase domain of avian R J s have revealed that it functions 
in DNA binding and in polymerization, or muluinerizauon. 

Some evidence for a DNA binding function comes from alignment of the deduced 

5 amino acid sequences of retroviral integrases. Three potential functional domains have 
been identified. An N-ierminal region is characterized by an HHCC (Histidine, Cysteine) 
zinc finger-like domain which stabilizes the structure of the integrase (approximately, 
amino acids 579-629 of SEQ ID NO:2). The central region of these integrases contains 
a catalytic domain which shares homology with bacterial transposases involved in the 

10 breaking and joining of nucleic acid molecules (approximately, amino acids 630-807 of 
SEQ ID NO:2). This region has acidic ammo acid residues which have been proposed to 
be involved in the binding of required metals (Mg ~ " or Mn ' ' ). Khan et al., Nucl. Acids 
Res. 19:851-860 (1991), reported DNA binding activity in this central region. The C- 
terminal region of these integrases is not conserved at the sequence level and its function 

15 is unknown (approximately, amino acids 808-858 of SEQ ID NO:2). However, deletion 
analyses indicate that this region contains strong sequence-independent DNA binding 
activity as well. 

The integrase polypeptide functions as a multimer, or polymer. The N-tcrminal 
zinc finger-like domain and the C-terminal deletion derivative have less tendency to 

20 dimerize. Hickman el al., J. Biol. Chcm. 269:29,279-29.287 (1994). Sedimentation 
analyses suggest that integrase occurs as a mixture of monomers, dimers and tetramers. 

The genome of the retroviruses codes for several genes, namely gag, pol, env, and 
the cellular oncogenes, tat. ars/trs, nef, rev etc. The pol gene codes for a polypeptide with 
reverse transcriptase (RT) activity. The RT enzyme has several activities, such as RNA- 

25 dependent DNA polymerase, DNA-dependent DNA polymerase, ribonuclease (RNase II). 
integrase. endonuclease and. possibly, protease activities. In the laboratory, reverse 
transcriptase is mainly used for its RNA-dependent DNA polymerase activity, which 
elongates an oligonucleotide primer, such as a tRNA. annealed to a template RNA or DNA 
strand to synthesize a DNA strand that is complementary to the template strand (cDNA) 

30 (Copeland. el al.. J. of Virology 36: 1 15-1 19 (1980): Berger, et. al.. Biochemistry 22: 
2365-2372) ( 1983)). 

Generally, there are three types of RT. Moloney-Murine Leukemia Virus (MMLV) 
is a monomeric RT, while HIV-RT and avian RTs are heterodimers. The HIV-RT 
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heterodinier consists of a 66 kDu p polypeptide and a 51 kDa a polypeptide. The avian 
RT heterodinier consists of a larger 95 kDa P polypeptide and a 63 kDa a polypeptide. 
The a polypeptides Irom IIIV-RT and the avian RTs differ in that the IIIY-RT a 
polypeptide lacks RNase H activity. The P polypeptide of HIV-RT and the p polypeptide 
5 of avian RTs differ in that the IIIV-RT P polypeptide lacks the integrase activity of avian 
RT p polypeptides. 

AMV-RT occurs in nature in multiple molecular forms, such as monomers, 
homodimers and heterodimers. However, the major active native form is a heterodinier 
of two structurally related polypeptide chains, an a subunit of 63 kDa and a p suhunit of 

10 95 kDa. These mature subunits are the products of post-translational processing of a 
precursor protein of 180 kDa (Gag + Pol). The 180 kDa protein is cleaved to a 95 kDa P 
subunit. The P subunit may be further cleaved to a 63 kDa a subunit and a 32 kDa 
endonuclease subunit. The a and P subunits have identical N-termini. (Roth, et a!., J. 
Biol. Chem. 260:9326-9335 (1985); Gerard, et «/. ,DNA 5: 271-279 (1986)). 

15 Beyond a difference in form (monomer v. heterodimer), the avian RTs differ from 

M MLV-RT in other ways. In contrast to MMLV-RT, the avian reverse transcriptases 
exhibit high processivity and yield, as well as biological activity {e.g., polynucleotide 
polymerase activity) over a wider range of temperatures extending up to at least 70°C. 
This ability to polymerize at higher temperatures is useful when working with RNA 

20 templates that have secondary structures. Additionally, this temperature stability has been 
exploited in amplification technologies such as NASBA and RAMP. Non-avian RTs, 
including those RTs having RNase H activity, have relatively low processivity and yield. 
For example, it has been estimated that approximately 50 times more MMLV RT is 
required than AMV-RT for cDNA synthesis. 

25 In addition to Avian Myeloblastosis Virus, the avian retroviruses include Avian 

Sarcoma Leukosis Virus (ASLV). Rous Sarcoma Virus (RSV), Avian Sarcoma virus 
(ASV), Avian Tumor Virus (ATV) and their helper viruses such as MAV, Avian Sarcoma 
helper virus UR2AVRT, Rous-Associated Virus (RAV), and others. The homology among 
the avian reverse transcriptases at the DNA level is between 90-98% and, at the amino acid 

30 level, the homology is 95-100%. 

Although the nucleotide sequences of many avian viruses are known (Schwartz et 
al.. Cell 32:853-869 (1983); see also Genbank Accession Nos. M24159, M37980, J02342, 
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J02021. J02343), cloning and expression of an active and stable RT in commercially 
useful amounts has not heen achieved. 

When i he I ) N A sequence of the poi gene of AMV and MAN' were compared, 
approximately 1 I 1 hp from the 3' end of MAV was found to be replaced by host DNA 
sequences in AMV. Kan et al.. Virology 145: 323-329 (1985). The rest of the DNA 
coding for the RNA- and DNA-dependent DNA polymerase and RNase II activities was 
intact. This deletion involved the coding region for the integrase domain of the P 
polypeptide, which causes AMV to be defective in the propagation of the virus, thereby 
creating a requirement for helper virus MAV to produce infectious progeny virus. Hence, 
the integrase domain is critical for producing infectious particles. Nevertheless, both the 
avian retroviruses and their helper viruses encode reverse transcriptases having RNA- and 
DNA-depcndent pol\ incrase and RNase II activities. 

AMV Reverse Transcriptase (i.e., AMV-RT) has been characterized and conditions 
for the synthesis of full-length cDNA products have been investigated. Berger el al.. 
Biochemistry 22:2365-2372 (1983). However, the length and yield of cDNA produced by 
AMV-RT have reportedly been limited by either a nuclease integral to AMV-RT or 
associated contaminants. See, U.S. Patent No. 5,017,492. In efforts to maximize cDNA 
length and yield, attention has turned to MMLV-RT. MMLV-RT is a reverse transcriptase 
that is relatively thermosensitive and exhibits relatively low reverse transcriptase activity. 
Efforts to improve the stability, and hence activity, of MMLV-RT reportedly met with 
some success in the form of C-terminal truncations of MMLV-RT. U.S. Patent No. 
5,017.492: see also U.S. Patent Nos. 5.244,797, 5,405,776, and 5,668,005. Beyond these 
modifications, the '492 Patent reports that some C-terminal amino acid changes enhanced 
MMLV-RT activity, albeit at the cost of a reduction in processivity. Notwithstanding these 
improvements. MMLV-RT is relatively thermosensitive and inefficient in catalyzing cDNA 
synthesis. 

The avian RTs are structurally distinct from MMLV-RT. At the primary structure 
level, avian RT. e.g.. AMV-RT, shares no more than 28% amino acid sequence similarity 
to MMLV-RT (no more than 50V; similarity at the polynucleotide level). Moreover, the 
native AMV-RT is a heterodimer composed of a 63 kDa alpha peptide and a 95 kDa beta 
peptide while MMLV-RT is an 80 kDa monomer. Not surprisingly, these enzymes differ 
in their thermostability. The thermophilic AMV-RT is active over a broad temperature 
range extending, at least, to 70 C C. Consequently, these avian RTs can often copy RNA 
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templates capable of forming relatively strong secondary structures. In contrast. MMLV- 
RT is a mesoplnlic enzyme. Also, relative to AMV-RT, approximately 50-fold more 
MMIA'-RT is required lor cDNA synthesis, r-urthennore. AMV-RT and MMI.V-RT 
differ in other properties such as processivity, metal co-factor requirements, error rate 
(i.e., rate of incorrect nucleotide incorporation), and tRNA primer preferences. These 
drawbacks in using MMLV-RT, in turn, increase the cost of effectively using MMLV-RT. 
Therefore, a need continues to exist in the art for a reverse transcriptase that can be 
produced economically and that exhibits one or more improvements in terms of 
processivity, stability, solubility, and thermal range, leading to increased lengths and yields 
of polynucleotide products, while minimizing the cost of the reverse transcriptase. 
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SUMMARY OF THE INVENTION 

The present invention relates to the discovery that reverse transcriptase polypeptides 
which have been modified. «.'.#, by altering existing mtegrase domains or by adding integrate 
domains that is modified themselves, are characterized by one or more improved properties, 
which include increased activity, stability, and solubility, as well as increased ease and 
versatility in producing such polypeptides The reverse transcriptase polypeptides of the 
invention may be derived from any source, including, but not limited to, Moloney-Murine 
Leukemia Virus (a Type I reverse transcriptase or RT), HIV (Type II RTs), and avian 
retroviruses (Type III RTs) One aspect of the invention is drawn to RT polypeptides that are 
truncated internally and/or at their C-termini, yet retain RNA-dependent DNA polymerase 
activity, the defining characteristic of reverse transcriptases The truncated polypeptides may 
also have, and preferably do have, DNA-dcpendent DNA polymerase activity Preferred 
polypeptides according to the invention exhibit RNase II activity For those truncated 
polypeptides corresponding to full-length reverse transcriptases having an integral integrase 
activity (e.g., avian retroviral RTs or modified Type I and Type II RTs that retain an integrase 
domain, unlike natural forms of these RTs), the truncation preferably extends into the 
mtegrase domain, effectively eliminating integrase activity from the truncated polypeptide. 
Such truncated polypeptides exhibit improvements in one or more of the following properties 
compared to their full-length counterparts: RNA-dependent DNA polymerase activity, 
expression levels, stability, and solubility These improvements result in more cost-effective 
RTs for use in a wide variety of DNA synthesis, amplification and sequencing technologies 
The invention also provides a chimeric RT polypeptide resulting from the effective 
addition of a protein domain to the C-terminus of the truncated RT, resulting in a non-native 
chimeric polypeptide (i.e., a polypeptide not found in nature) These protein domains provide 
a DNA binding capability, a metal binding capability, a structure stabilizing capacity, or a 
polymerization (i.e., multimerization) capability, and preferably several capabilities. With 
these added, or enhanced, capabilities, the chimeric polypeptides of the invention exhibit 
improvements in RNA-dependent DNA polymerase activity, protein expression levels, protein 
stability, and/or protein solubility, with chimeric polypeptides of the invention frequently- 
showing improvement in all four properties Preferred protein domains include a plurality of 
histidmc residues (i.e., 1 lis tags), and either the N-terminal domain (providing a DNA binding 
capacity, preferably resulting from a zinc finger domain) or the C-terminal domain (providing 
a polymerization domain) of the integrase region of a native RT 
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More specifically, the invention provides reverse transcriptase polypeptide fragments 
(/ c. . portions of full-length RT polypeptides), modified reverse transcriptase polypeptides, 
and analogs and \ananLs thereol' Preferabh. the polypeptides of the invention aie 
thermostable avian KTs that have improved RNA- and DN A-depcndent DNA polymerase 
activities, resulting in increased lengths and yields of synthesized polynucleotide products 
Typically, the polypeptides of the invention lack the catalytic activity of the integrasc domain 
provided by the ('-terminal region of the full-length polypeptides (e.g., nucleotides 1 7 19-2571 
of Sl-Q ID NO I (Type III), nucleotides 2464-3012 of SliQ ID NO 40 (Type 1), and 
nucleotides I8IO-270X of SliQ ID NO 42 (Type II)) The absence of catalytic activity 
provided by the integrasc domain is expected to result in polypeptides that are more soluble 
and expressed at higher levels, hence, such polypeptides are more amenable to economical 
purification in commercially useful quantities. In addition to this benefit, the chimeric 
polypeptides of the invention are expected to facilitate nucleic acid binding or polymerization 
(homo-polymerization or hetero-polymerization), and preferably both activities, which 
contribute to the improved performance of the polypeptides The improved RT performance, 
in turn, translates into improvements in the many techniques dependent on RT activity, such 
as cDNA production and cDNA library preparation as well as a variety of polynucleotide 
amplification and sequencing technologies These amplification techniques include RT-PCR, 
NASBA, 3SR, and RAMP The improved DNA-dcpendent DNA polymerase activities of the 
polypeptides of the invention are useful in, e.g., PCR, the Inverse Polymerase Chain Reaction, 
the Multiplex Polymerase Chain Reaction, SDA, and Multiplex SDA The sequencing 
technologies include the many variations on the Sanger dideoxy sequencing technique 

One aspect of the invention is an isolated polynucleotide encoding a polypeptide 
according to the invention. In general terms, the invention comprehends polynucleotides 
encoding polypeptides having RT activities, those polynucleotides typically lacking 
approximately 200-1,122 bp ot the 3' ends of the corresponding native RT genes, Por 
example, a full-length avian RT gene (i.e.. MAV pal) is 2,692 bp (SEQ ID NO: 1) and the 
invention contemplates MAV-derived polynucleotides of approximately 1,570-2,492 bp in 
length. More generally, the polynucleotides of the invention may result from truncations 
to RT-cncoding polynucleotides derived from any source, including: AMV, MAV. RSV, 
ASLV, ATV, MMI.V and HIV. In particular, the invention contemplates an isolated 
polynucleotide encoding a polypeptide having RNA-dependenl DNA polymerase activity, 
the polypeptide consisting of any one of the following sequences: an amino acid sequence 
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beginning at amino acid 1 and terminating at any one of amino acids 428 to 857 of SEQ 
ID NO:2: an amino acid sequence beginning at amino acid 1 and terminating at any one 
oi amino acids 428 to 1.054 of ShQ ID NO: 39: an ammo acid sequence beginning ai 
amino acid 1 and terminating at any one of amino acids 548 to 1.198 of SEQ ID NO:41; 
an ammo acid sequence beginning at amino acid 1 and terminating at any one of amino 
acids 428 to 901 of SEQ ID NO:43; and variants, analogs and fragments of any of the 
above-described polypeptides having RN A-dcpendent DNA polymerase activity, the 
aforementioned polypeptides (i.e., polypeptides and variants, analogs, and fragments 
thereof) optionally having an N-terminal methionine. An exemplary polynucleotide has 
a sequence set forth in any one of SHQ ID NOs 1, 7, 9, 38. 40, and 42. The 
polynucleotides preferably comprise a start codon specifying methionine at the 5' end. 
Other truncated polynucleotides of the invention have internal deletions, preferably 
removing at least part of an integrase domain. For example, polynucleotides according to 
the invention comprise the sequence set forth in SEQ ID NO:40, with part or all of 
nucleotides 2464-3012 deleted, or comprise the sequence set forth in SEQ ID NO:42, with 
part or all of nucleotides 1840-2708 deleted, or comprise the sequence set forth in SEQ ID 
NO:l. with part or all of nucleotides 1719-2571 deleted (e.g., deletion of nucleotides 
1860-2310, 1920-2310, or 1980-2310 of SEQ ID NO:l). Such polynucleotides encode 
polypeptides that lack an effective integrase activity in that the polypeptides do not promote 
i detectable polynucleotide integration. 

Other polynucleotides according to the invention encode chimeric polypeptides, 
such polynucleotides comprising a polynucleotide encoding a polypeptide having RNA- 
dependent DNA polymerase activity and an adjacent polynucleotide encoding a terminal 
modification of that polypeptide, thereby encoding a chimeric polypeptide. Preferred 
> polynucleotides encode a chimeric polypeptide having one or more amino acids attached 
to the C-terminus of a polypeptide having RN A-dependent DNA polymerase activity. 
Such polynucleotides may contain one of the above-described coding regions fused (in 
frame) at its 3' end to a region encoding one or more amino acids. For example, the 3' 
end of a coding region may be fused to one or more codons for a charged amino acid such 
3 as hist.dinc. lysme, argimne. aspartate, or glutamate. Alternatively, the 3' end of the 
coding region may be fused to a region encoding a polypeptide, preferably having four to 
fifty (e.f;., six) amino acids and preferably comprising a domain selected from the group 
consisting of a DNA binding domain, an RNA binding domain, a metal binding domain, 
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a pol_\ merizanon domain, and a strucuirc stabilizing domain. Examples of such domains 
include, bui are not limited to. disulfide bond forming cysteine residues, a zinc finger 
domain, an acidic amino acid domain, and a basic ammo acid domain, a bulk) amino acid 
domain (e.g.. W or W-H, single-letter amino acid identifications), a PPG domain, a GPRP 
or a I'RPG (i.e.. inverse GPRP) domain, a leucine zipper molif or domain, and an NS1 
binding site, among others. Examples of suitable domains include, but are not limited to, 
the N terminal domain of the MAV-RT integrase region which provides a DN A binding 
domain and the C'-terminal domain of the integrase region which provides a polymerization 
domain, f urther, the polynucleotides encoding chimeric polypeptides having a plurality 
of ( '-terminal amino acids may encode the same amino acid a number of times. Such 
polynucleotides may encode basic (e.g., Histidine) amino acids at the C-terminus. Also 
preferred are polynucleotides that have a stop codon (e.g.. TAA, TAG, or TGA) at the 3' 
end ol" a coding region of a chimera according to the invention. An exemplary 
polynucleotide encoding a chimeric polypeptide has a sequence selected from the group 
consisting of a sequence set forth in any one of SEQ ID NOs 1 1-19. 

Still other polynucleotides of the invention encode a chimeric polypeptide having 
one or more amino acids attached to the N-terminus of a polypeptide having RNA- 
dependent DNA polymerase activity. In addition, the invention contemplates 
polynucleotides that encode more than one modification, such as an N-terminal peptide 
addition and a C-terminai peptide addition or a C-lerminal peptide addition coupled to an 
internal deletion of at least part of an integrase domain. 

The invention also provides a vector comprising any of the aforementioned 
polynucleotides. A preferred vector comprises a polynucleotide operably linked to a 
promoter. 

Another aspect of the invention is directed to a host cell transformed with a 
polynucleotide of the invention, such as prokaryotic (e.g., Escherichia coli) or eukaryotic 
cells (e.g.. insect cells). In a related aspect, the invention comprehends a method of 
transforming host cells comprising the following steps: introducing a vector according to 
the invention into a host cell; incubating the host cells; and identifying host cells containing 
the vector, thereby identifying a transformed host cell. 

Still another aspect of the invention is a method of producing an isolated reverse 
transcriptase polypeptide comprising the step of transforming a host cell with a vector as 
described above, incubating the host cell under conditions suitable for expression of a 
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polypeptide, and recovering the polypeptide, thereby producing an isolated reverse 
transcriptase polypeptide according to the invention. 

In another aspect, the invention provides the polypeptides encoded by the 
polynucleotides described above. These polypeptides include polypeptide fragments (e.g.. 
5 \] RT fragments containing part, but not all, of the C-terminal integrase domain) and 
chimeric polypeptides, as described above, as well as variants and analogs thereof. In 
general terms, the invention contemplates all types of reverse transcriptase fragments and 
chimeras (and variants and analogs thereof) including, but not limited to, the three classes 
of RTs exemplified by MMLV-RT, HIV-l-RT, and avian RTs. Exemplary chimeric 

10 polypeptides contain an N-tcrminal methionine or a C-terminal peptide providing useful 
functions (e.i>., expression enhancement, nucleic acid binding domains, metal binding 
domain, structure stabilizing domains, or polymer-forming domains). Other chimeric 
polypeptides according to the invention may result from modification of RTs derived from, 
e.g., the following sources of Types I-Ill: ASLV, ATV, MMLV, HIV-1, and HIV-2. A 

15 preferred addition to an RT is a C-ierminal peptide comprising a plurality of amino acids 
such as basic amino acids, a nucleic acid binding domain, a metal binding domain, or a 
polymerization domain. Preferably, the C-terminal peptide provides more than one 
functionally significant domain. Also preferred is one or more C-terminal cysteine 
residues, which, at a minimum, provide a capacity to induce polypeptide homo-, or hetero- 

20 , polymerization, such as dimerization. Typical polypeptides of the invention arc relatively 
soluble and are capable of being expressed at high levels, resulting in relatively high levels 
of RT activity expected to facilitate economical purification. 

Yet another aspect of the invention is an improvement in a method for copying a 
target nucleic acid by extending a target nucleic acid-bound primer, the improvement 

25 comprising: contacting the target nucleic acid and primer with a polypeptide according to 
the present invention. The method preferably produces one or more copies of the target 
nucleic acid and the polypeptide may be a polymer. Any method for copying a target 
nucleic acid using a polymerase is comprehended by the invention, including, but not 
limited to, cDNA synthesis. Polymerase Chain Reaction, Polymerase Chain Reaclion- 

30 Reverse Transcription, Inverse Polymerase Chain Reaction. Multiplex Polymerase Chain 
Reaction, Strand Displacement Amplification, Multiplex Strand Displacement 
Amplification, Nucleic Acid Sequence-Based Amplification. Sequence-Specific Strand 
Replication and Rapid Amplification. 
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Another aspect of the invention is directed to improved methods for sequencing a 
target nucleic acid by extending a target nucleic acid-bound primer, the improvement 
comprising: contacting tiie target nucleic acid and primer with a polypeptide according to 
the present invention. 

5 Yet another aspect of the invention is a kit for copying a target nucleic acid 

comprising one or more nucleotides and a polypeptide according to the invention. 
Preferred polypeptides include those polypeptides encoded by a polynucleotide having a 
sequence selected from the group consisting of SEQ ID NO:6, SEQ ID NO:7, SEQ ID 
NO:8, SEQ ID NO:9, SEQ ID NO: 11, SEQ ID NO: 12, SEQ ID NO: 13, SEQ ID NO: 14, 
10 SEQ ID NO: 15, SEQ ID NO: 16, SEQ ID NO: 17. SEQ ID NO: 18, SEQ ID NO: 19, SEQ 
ID NO: 38, SEQ ID NO:40, SEQ ID NO:42 and polynucleotide derivatives thereof 
encoding C-tcrminal amino acids or polypeptides at their 3' ends. 

Numerous other aspects and advantages of the present invention will be apparent 
upon consideration of the following drawing and detailed description. 



BRIEF DESCRIPTION OF THE DRAWINGS 

Fig. 1 photographically depicts Western blot analysis of RT expression products of 
insect cells. 

Fig. 2 illustrates recombinant RT fractionated on an 8% SDS-PAGE gel and stained 
with Coomassie Blue. 

Fig. 3 presents an autoradiograph of gel-fractionated cDNAs produced by an RT 
polypeptide according to the invention. 

Fig. 4 graphically presents temperature profiles for cDNA production using native 
and recombinant RTs (Fig. 4A), temperature profiles of nRT and rRT catalyzing RT-PCR 
(Fig. 4B), temperature profiles for RT-mediated RAMP (Fig. 4C), pH profiles for nRT 
and rRT in RT assays (Figs. 4D and 4E), magnesium ion profile for nRT and rRT in RT 
assays (Fig. 4F), and other divalent cation profiles for nRT and rRT in RT assays (Fig. 
4CJ). 

Fig. 5 illustrates the relative DN A-dependent DNA polymerase activities of native 
RT and recombinant RT. 

Fig. 6 shows a graphic comparison of the relative RNase activities of native RT and 
recombinant RT at 37°C (Fig. 6A); Fig. 6B shows a temperature profile for the RNase H 
activity of rRT. 
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ivfta H F.n DESCRIPTI ON OF THE INVENT ION 

The present invention provides truncated reverse transcriptase polypeptides {i.e. 
fragments), and analogs and variants thereof. Preferably, these polypeptides exhibit 
improved levels of RNA-dependen. DNA polymerase activity, frequently extending over 
a w.de range of temperatures up to 7(TC and beyond. Also preferred are internally or 
terminally truncated polypeptides having sequences compatible with improved levels of 
expression. A preferred polypeptide according to the invention has a temperature optimum 
of 45 U -55°C. Also preferred is a polypeptide consisting of an amino acid sequence set 
forth in SEQ ID NO:2. SEQ ID NO:3, SEQ ID NO:4. SEQ ID NO:5. SEQ ID NO:39. 
SEQ ID NO:41, or SEQ ID NO:43. Some of these polypeptides correspond to C-terminal 
truncated forms of avian reverse transcriptases, such as the full-length Myelogenic Avian 
Virus-Reverse Transcriptase (i.e. , MAV-RT). A preferred polypeptide of the invention 
lacks an effective integrase catalytic activity and is expressed at elevated levels, providing 
a source of soluble, and recoverable, polypeptide in active form. Exemplary integrase 
domains include a Type I doma.n (nucleotides 2464-3012 of SEQ ID NO:40). a Type II 
domain (nucleotides 1840-2708 of SEQ ID NO:42) and a Type III domain ((nucleotides 
1734-2571 of SEQ ID NO:l), any of which may be modified by internal or terminal 
deletion(s) or by substitution or chemical modification. Because integrase and RT function 
sequentially in the viral life cycle, it is possible that RT and integrase act in a complex. 
Thus, without wishing to be bound by theory, the added functions of nucleic acid binding 
and polymerization provided by the integrase domain of avian RTs may result in increased 
processivity and superior performance of such RTs. Accordingly, non-native chimeric 
polypeptides of the invention further include the C-terminal addition of a polymerizing 
domain, such as a plurality of the same, or different, amino ac.ds. Non-native chimeric 
polypeptides are herein defined as polypeptides not found in nature. Thus, if the parts of 
the chimera are found in nature, they are not found in the same relationship as exists in the 
non-native chimeric polypeptide. Preferred C-icrminal amino acid additions are basic 
amino acids, such as histidine, lysine and arginine. These preferred C-terminal additions 
may promote polymerization by. metal chelation; the basic amino acids also may 
prov.de or enhance the nucleic acid binding capacity of the polypeptide. A preferred 
number of C-terminal ammo acid additions is 4-50. more preferably six amino acids. As 
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one alternative to a plurality of basic amino acids, one or more cysteine residues may be 
added to the C-terminus of the polypeptide. Other alternatives are C-terminal peptides of 
4-50 ammo acids having a polymerizing capacity or a DNA binding capacity, and 
preferably both capacities. In addition, to RNA-dcpcndent DNA polymerase activity, the 
5 polypeptides may also have DNA-dependent DNA polymerase activities or RNase H 
activity. 

The invention also comprehends polypeptide variants, which have substantially the 
same amino acid sequence as one of the polypeptides described above. "Substantially the 
same" means that the sequence of the polypeptide may be aligned with one of the sequences 

10 disclosed herein, using any of the approaches known in the art (e.g., DNASIS, Hitachi 
Software Engineering America, Ltd., San Bruno, CA) such that the sequences are at least 
90%, and preferably 95% or 98%, similar throughout the aligned region. For example, 
the invention contemplates the conservative substitution of asparaginc for aspartate at any 
one or more of amino acid positions 450, 505, or 564 of SEQ ID NO:2 to produce variant 

15 MAV-RT polypeptides lacking RNase II activity; that same substitution at any one or more 
of amino acid positions 497, 552, or 603 of SEQ ID NO:43 produces variants of HIV-RT 
polypeptides lacking RNase H activity. Other residues which may be changed by 
conservative substitution to generate RNase II " variants of MAV-RT include amino acid 
positions 484, 549, and 572 of SEQ ID NO:2. More generally, the invention comprehends 

20 polypeptides having substantially the same amino acid sequences, regardless of whether the 
differences involve conservative substitutions or not. For example, the residues identified 
above may be changed in a non-conservative manner. In addition, other residues known 
to be involved in RNase H activity may be altered by substitution or deletion. These 
residues include, but are not limited to, amino acids at positions 441-578 of SEQ ID NO:2 

25 (AMV-RT and MAV-RT; see also. RSV-RT); positions 427-1,055 of SEQ ID NO:39 
(HIV-2-RT); positions 625-911 of SEQ ID NO:41 (MMLV-RT); and positions 427-902 
of SEQ ID NO:43 (HIV-l-RT). The invention also comprehends polypeptide analogs, 
which are defined herein as polypeptides that either contain known equivalents for one or 
more of the conventional amino acids or have been derivatized in a manner understood in 

30 the art (e.g., glycosylation. pegylation, phosphorylation), or both. 

Another aspect of the invention is drawn to polynucleotides encoding the 
aforementioned polypeptides. A preferred polynucleotide consists of the sequence set forth 
as SEQ ID NO:l, SEQ ID NO:6, SEQ ID NO:7, SEQ ID NO:8, SEQ ID NO:9, SEQ ID 
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NO:10. SEQ ID NO:ll, SEQ ID NO: 12. SEQ ID NO: 13. SEQ ID NO: 14. SEQ ID 
NO: 15, SEQ ID NO: 16. SEQ ID NO: 17. SEQ ID NO: 18. SEQ ID NO: 19. SEQ ID 
NO:38, SEQ ID NO:40. or SEQ ID NO:42. Also contemplated by the invention arc 
polynucleotides substantially the same as the polynucleotides having one of the above- 
5 identified sequences. In the context of polynucleotides, "substantially the same" means 
that the polynucleotide has a sequence that is at least 90% homologous to one of the above- 
described polynucleotides. 

Beyond the polynucleotides, the invention provides vectors containing at least one 
of these polynucleotides. Further, these vectors may be functional in prokaryotic cells. 

10 eukaryotic ceils, or both cell types. A preferred vector is a Baculovirus vector such as 
pBacPak9 (Clontcch Inc. Palo Alto, GA). The invention also provides prokaryotic and 
eukaryotic host cells transformed with the above-identified polynucleotides. A preferred 
host cell is an Sf9 insect cell transformed with a Baculovirus-based recombinant molecule 
of the invention. Other insect cell lines, such as SF21 HighFive may also be used. 

15 In another aspect, the invention provides methods of using the polynucleotides to 

produce RTs according to the invention. In particular, the polynucleotides are transformed 
into a prokaryotic or eukaryotic host cell under conditions that allow expression of the 
encoded RT polypeptide and, following an incubation period, the polypeptide is isolated. 
In yet another aspect of the invention, methods of using the RT polypeptides are 

20 provided. These methods realize the benefits of speed and yield from using highly active 
and thermostable RT polypeptides to copy target nucleic acids {e.g., cDNA synthesis, 
cDNA library construction), amplify, or sequence a target nucleic acid. Suitable 
amplification methodologies include, but are not limited to, PGR, RT-PCR, Inverse PGR, 
Multiplex PGR, SDA, Multiplex SDA, NASBA, 3SR, and RAMP. Suitable sequencing 

25 methodologies include the original enzymatic sequencing technology disclosed by Sanger 
and co-workers, or any of the numerous variations of that technique that have been 
developed since that disclosure. 

Various aspects of the invention are described in the following Examples, wherein 
Example 1 describes the cloning of a coding region encoding the full-length MAV-RT; 

30 Example 2 describes the sequencing of the full-length MAV pol gene encoding reverse 
transcriptase: Example 3 discloses the cloning of selected polynucleotides according to the 
invention; Example 4 details the large-scale purification of the expressed recombinant RT; 
Example 5 describes SDS-PAGE and Western blot analyses of expressed proteins; Example 
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6 discloses an assay for RNA-dependent DNA polymerase activity: Example 7 illustrates 
assays characterizing the native reverse transcriptase (nRT) and recombinant reverse 
transcriptase (rRT) in terms of optima tor temperature, pH. MgCK. and other divalent 
cation concentrations; Example 8 discloses use of RTs in methods for copying and/or 
amplifying target nucleic acids: Example 9 describes a DNA-depcndent DNA polymerase 
assay used to characterize nRT and rRT; Example 10 reports a comparison of the RNase 
H activities of nRT and rRT; and Example 11 describes the cloning and expression of 
additional polynucleotides according to the invention. 

Example 1 

The pol gene of MAV, encoding the full-length RT precursor polypeptide, was 
cloned from pMAV, a pBR322 derivative containing the pol, gag and partial env gene of 
MAV. Data derived from a partial restriction map of the insert fragment of pMAV is 
shown in Table I. Based on the map data, the pol coding region, along with some 5' and 
3' non-coding sequences, was excised and ligated into several prokaryotic and eukaryotic 
vectors, as described below. Several recombinants were obtained from these vectors. 
Anti-RT monoclonal antibodies were used to analyze the expression of RT {see Example 
5). 



Feature 


Relative Position (bp) 


EcoRl 


69 


Pal 


200 


Start codon (pol) 


253 


BglU 


1988 


Kpn\ 


2748 


Stop codon (jjoI) 


2943 


Xliol 


3013 


Psi\ 


3155 
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All non cod ins 5' (i.e.. upstream) nucleotides were removed to increase the 
expression of RT. Also, the open reading frame of the natural RT gene starts with an 
■■AC 1 ' ( | h, ,. uhiJi is dpi a irequently used start codon in prokaryotes. The codon that 
is most frequently used is "ATG" (Met). "ATG" can serve as a start codon lor efficient 
expression of RT in both prokaryotes and eukaryotes. Therefore, an "ATG" was added 5' 
to the natural "ACT" start codon in order to allow efficient expression of the protein in 
prokaryotes and eukaryotes (ATG ACT GTT GCG CTA CAT CTG GCT ATT CCG CTC 
AAA TGG AAG CCA AAC CAC ACG CCT GTG TGG ATT TTC CAG TGG CCC, 
etc.: compare the sequences provided in SEQ ID NOs 2 and 3). 
Construction of Prokarjotic Recombinant Vectors 

p| | contains a strong and tightly regulated lambda P R promoter, a temperature 
sensitive A cl repressor, an E. coli origin of replication, and Amp' for selection. Because 
this vector encodes a temperature-sensitive repressor, a special E.coii strain was not 
required for regulation of expression. 

The entire codmg region of the MAV-RT (EcoR\-Xhol fragment, obtained by 
restriction digestion or F>CR with suitable primer pairs), as characterized by the restriction 
map data of Table I. was inserted into the multiple cloning site (MCS) of pll. Briefly, the 
vector was restricted with EcoRl and Sail. A 1:1 ratio of insert to vector was ligated in 
the presence of 1 mM ATP in ligation buffer (100 mM Tris-HCl, pH 7.6, 10 mM MgCU, 
) 20 mM DTT) and T4 DNA ligase using a convention protocol. Sambrook ex al., in 
Molecular Cloning: A Laboratory Manual, Cold Spring Harbor Laboratory Press, Cold 
Spring Harbor NY (2d Ed. 1989). The ligation mix was incubated at 16°C for 2-4 hours. 
The ligated mix was transformed into electro-competent E.coii cells in 1 mm cuvettes 
using a BioRad electroporator at 1.8 KeV and 200 ohms. The transformed cells were 
S plated on LB-ampicUlin plates and single colonies were picked for overnight growth and 
mini-prep analyses. The recombinants were then confirmed by sequence analyses. 
Subsequently, the 5' noncoding regions of selected recombinants were removed by site- 
directed mutagenesis where appropriate. The pll vector containing the full-length RT gene- 
was named pHSEMl and the vector having the 5' non-coding region deleted was called 
0 pi ISEMUE33 (i.e. . pHRT). The RT protein was expressed and analyzed by SDS-PAGE 
and RT assays were performed as described in Examples 5 and 6. Other prokaryotic 
vectors were also successfully used (e.g., P ET21d and pTZ18U, which have the T7 
promoter and the lacZ promoter, respectively). 
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Construction of Eukaryotic Recombinant Transfer Vectors 

A baculovtral expression system consisting of a transfer vector, a wild-type virus 
AcMNPV (Autographa calit'ornica nuclear polyhedrosis virus) or a derivative ol ACMNPV 
(i.e., BacPak6 (Clontech Inc.)) was used to obtain recombinant transfer vectors containing 
the RT gene. 

The AcMNPV genome is a double-stranded circular DNA of 134 kb. The size of 
the virus makes it difficult to directly manipulate the viral genome itself. Therefore, 
transfer vector pBacPak9 was used to generate recombinant molecules in accordance with 
the invention, such as pMBacRT, pBacMIBA, pBacMIKA, pBacMIBAhis and 
pBacMlKAhis (see below). These recombinant molecules, containing exogenous and 
typically foreign RT coding regions, were used to introduce the sequence into the viral 
genome for expression and propagation. Vector pBacPak9 has a strong polyhedron 
promoter which is induced in insect cells late in the replication cycle of the virus. Hence, 
foreign genes, including lethal genes, expressed with this late promoter are not toxic to the 
growing cell. The polyhedron gene is not necessary for the maintenance of the virus and 
was therefore replaced by the foreign gene (i.e., MAV-RT po I gene). 

The 2.81 kb Pst\ fragment from pHSEMl, containing the full-length RT gene, was 
inserted into the Pst\ site of the MCS of pBacPak9, and the recombinants were called 
pBpMPC3,4 (i.e., pBacRT). Insertions of the gene were confirmed by miniprep analyses 
and sequencing. The 5' non-coding region (sec, SEQ ID NO:l) was removed by site- 
directed mutagenesis, as described in Sambrook et al. , (1989). The resulting recombinant 
vector was called pBpHPCM 10, 1 1 , 17 (i.e., pMBacRT). In pMBacRT. the RT gene is 
flanked by viral DNA sequences of BacPak6, a derivative of AcMNPV. When pMBacRT 
was introduced into insect cells along with BacPak6 DNA, the plasmid recombined with 
the BacPak6 DNA to yield recombinant, infectious progeny virus (Ml-5 and Ml-6. 
collectively Ml -5.6) containing the RT gene. 

In general, when SF9 tissue culture cells are infected with recombinant virus, the 
viral particles entered the cells and the viral DNA is uncoated in the nucleus. Viral DNA 
replication occurs approximately 6-24 hours post-infection. During the late phase of the 
viral infection, approximately 48-72 hours after virus infection, all transcription is shut off 
except the genes having the polyhedron and plO promoters, which are transcribed at very 
high levels. Hence, the RT gene under the control of the polyhedron promoter in the 
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recombinant virus was expressed at high levels late in the infection cycle. This 
recombinant AcMNPV was propagated in the budded form only. 

Example 2 

A primer walking sequencing strategy implementing Sanger's enzymatic sequencing 
technique was used to confirm the sequence of the MAV-RT pol gene. Sambrook el «/. , 
(1989). The sequencing template was the insert of pHSEM 1 . Primers were designed to 
be homologous or complementary to an end of a previously determined sequence. These 
primers were then used to progressively extend the identification of pol gene sequence until 
the sequence of the entire coding region had been determined. 

The polynucleotide sequence of the MAV-RT gene and the flanking sequences are 
set forth as SEQ ID NO:l. Amino acid sequences encoded thereby are set forth in SEQ 
ID NO:2. Of the 3,155 bp presented in SEQ ID NO:l, 2,498 bp codes for the beta 
fragment (nucleotides 253-2751 of SEQ ID NO. 1) of MAV-RT; the alpha fragment of 
MAV-RT is encoded by nucleotides 253-1990 of SEQ ID NO: 1. These coding regions are 
expected to encode polypeptides containing amino acids 1-895 of SEQ ID NO:2 (full- 
length RT; see also SEQ ID NO:3), amino acids 1-833 of SEQ ID NO:2 ((Mike 
polypeptide; see also, SEQ ID NO:5) and amino acids 1-579 of SEQ ID NO:2 (a-like 
polypeptide; see also, residues 1-578 of SEQ ID NO:4). The p-like polypeptide is a 
fragment of the native MAV-RT p polypeptide. The a-like polypeptide is larger than the 
native MAV-RT a polypeptide and smaller than the native MAV-RT p polypeptide, with 
the native a polypeptide sequence extending from the N-terminus of the a-like polypeptide. 
For brevity, the a-like and p-likc polypeptides are referred to as the a and p polypeptides, 
respectively. 

Example 3 

As described in Example 1, plasmids pHSEMl and pBaeRT were constructed to 
contain 2.95 kb and 2.81 kb inserts, respectively. These fragments contained the entire 
reverse transcriptase gene along with 5' and 3' non-coding regions. The 5' non-coding 
region of each construct was then removed by site-directed mutagenesis, a well-known 
technique in the art. In particular, the primer FSDRT (5 -TGTACTAAGGAGGTG- 
TTCATGACTGTTGCGCTACAT-3'; SEQ ID NO:20) was used with pHSEMl as a 
template to generate pHRT (pHSEMUE33). Primer RSDBAC2 (5'-GCCAGATGT- 
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AGCGC A AC' ACiTC'A'l'ATTTA 1'AGGTTT ITTTA'Il'AC-3' ; SEQ ID NO:21) was used 
wiih pBucR'l as a template to generate pMBacRT (pBPHPC3M10. pBPHPC3Mll. 
pBIMUH 3M17. o.. respective!). pMBaclO, pMBacll and P MBacl7). 

The full-length RT coding region was used as a starting material in constructing 
deletion derivatives that lacked the 3' end of the MAV-RT coding region to varying 
extents. Relative to the full-length gene (MI-5.6, sec below), the 3' (C-terminal) deletion 
extending to the K P n\ site (MIKA) increased the RT expression level, as evidenced by 
SDS-PAGE. Relamc to the full-length gene (MI-5,6), deletion of the region extending 
from the Bgll\ sile lo the 3' terminus (MIBA) increased the RT expression level, activity 
and solubility, as evidenced by SDS-PAGE and activity assays (see below). Relative to 
the alpha fragment ol MAV-RT. the beta fragment has an additional 254 amino acids at 
the C-term.nus. which provides an integrase activity. This region of the polypeptide 
contributes to the insolubility of the polypeptide and reduces its recovery from cell 
extracts, as shown by the relative insolubility of a < + ) integrase form of RT (e.g., the 
MIKA gene product, sec below) compared to a (-) integrase form (e.g., the MIBA gene 
product). Because the integrase domain is only needed for the retroviral life cycle and not 
for the RNA- or DNA dependent DNA polymerase activities, this region was deleted in 
MIBA (a fragment equivalent). Note that the a fragment of MIBA (amino acids 1-578 
of SEQ ID NO:2) is larger than the naturally occurring a fragment of MAV-RT (amino 
acids 1-573 of SEQ ID NO:2). Without wishing to be bound by theory, this deletion was 
expected to result in an increase in the solubility, and hence recovery, of the protein. 

Using the full-length RT recombinants, additional clones were constructed to 
express polypeptides having C-terminal deletions in order to increase the levels of 
expression and to stabilize the RT activity (RN A-dependent DNA polymerase activity). 
Convenient restriction sites such as Bgl II (spanning nucleotides 1.986-1,991 of SEQ ID 
NO:l) and Kpn\ (spanning nucleotides 2,745-2,750 of SEQ ID NO:l) were used to 
eliminate the 3' end of the coding region of the RT gene (see. Table I). The 3' deletion 
derivatives, encoding RT polypeptide fragments having CMcrminal deletions, were 
obtained by Bgt\\-P*\ or Kpn\-Ps\\ restrictions of pMBacRT and pHRT. respectively 
(BglU and Kpnl sites in the MAV-RT coding region; Psil site in the vector). Recombinant 
molecules containing the Bgl U-Pst\ 3' terminal deletion were designated pBacMIBA and 
pHBRT (pH33ABP6) and recombinant molecules containing the Kpn\-Psi\ deletion were 
designated pBacMIKA and pHKRT ( P H33AKP5). The deletion derivatives pBacMIBA and 
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piiacMIKA had approximately 1.2 and 0.4 kb deletions from the 3' end of the full-length 
gene (see. SEQ ID NO: 1 ). respectively. The fragment bounded at its 3' end by the Z? Q /II 
Mic (SEQ ID NO:oi was used to express an alpha fragment equivalent of R'l and the 
Iragmcni bounded by the Kpnl site (SEQ ID NO: 8) was used to express the beta fragment 
equivalent of RT (the p tragment equivalent of M1KA contained amino acids 1-832 of SEQ 
ID NO:2; native MAV-RT P contains amino acids 1-858 of SEQ ID NO:2). 

Mmiprep and sequencing analyses were done to confirm the identities of the 
recombinant clones described above. Recombinant viruses obtained from co-transfection 
uuh virus BacPak6 and transfer vector pBacMIBA or pBacMIKA were called M1BA and 
M 1 KA. respectively. 

Recombinants encoding 3' terminal amino acid tags 

Without wishing to be bound by theory, the constructs that deleted the integrase 
domain of RT, such us M1BA and pBacMIBA, were not expected to retain the DNA 
binding, structure stabilizing, and polymerization functions attributable to the integrase 
domain To re-introduce these functions, without the deleterious impact on solubility and 
host cell viability associated with the native integrase domain, codons specifying amino 
acids (His) were added to the 3' end of the modified RT coding regions. The basic nature 
ot the added amino acids may have been responsible for increased binding to the negatively 
charged nucleic acids, enhancing the stability of the polypeptides. The increased binding 
may. m turn, have been responsible for the increase in activity found with the his-tagged 
RTs. relative to their untagged counterparts. In addition, the his tags may have contributed 
lo the tendency of the his-tagged RTs of the invention to form polymers, perhaps through 
his mediated chelation of metal ions such as Ni * " . A his-tagged RT (MlBAhis) was found 
m homo-polymeric form (molecular weight greater than 200 kDa), as determined using 
non denaturing PAGE and molecular sieve chromatography with Superose 12HR 10/30 
(separation range of 1-300 kDa; Rharmacia-Upjohn). Thus, the invention contemplates RT 
polypeptides lacking an effective integrase domain, but having the capacity to bind DNA 
and.'or polymerize. These additional functionalities may be provided by adding, preferably 
at the C'-terminus of the modified RT, such structures as known DNA binding domains, 
zinc finger or zinc-fingcr-like domains, polymerization domains, acidic amino acids, basic 
amino acids, or one or more cysteines. Such modified RTs may be ultimately derived 
Irom avian or non-avian sources. 
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His-tag additions 10 the C-tennini of the RT polypeptides were achieved by 
recombinant expression of coding regions fusing RT coding regions to Mis codons. In 
particular, the fusions were constructed hv adding oligonucleotides containing 6 histidine 
codons at the 3' end of the RT gene using ligase. as in the case of the construction of 
pBacMIKAhis. or by PGR amplification with oligonucleotides that specified 6 histidine 
codons, as in the case of the construction of pBacMIBAhis. 

The construction of pBacMIKAhis was accomplished with oligonucleotides FNhis 
(SEQ ID NO:33) and RNhis (SEQ ID NO: 34), each of which contained internal histidine 
codons and compatible Noil restriction sites at each end. Following their conventional 
syntheses, the oligonucleotides were annealed and ligated to the 3' terminus of RT in 
pBacMIKA cut with Nor]. For the construction of pBacMIBAhis or pBacMIKAhis using 
PGR, primers FRT (SEQ ID NO:22) and either M 1 BARSDhis (SEQ ID NO:23) or 
M 1 KARSDhis (SEQ ID NO:24) were used with pHSEMl as the template. Blunt-ended 
and phosphorylated PCR products containing the 3' deletions and histidine tag-encoding 
regions were inserted into the Srnal site in the MCS of pBacPak9. The his-tag derivatives 
of the transfer vectors were called pBacMIBAhis and pBacMIKAhis and the viruses 
obtained by co-transfection of Sf9 cells with the aforementioned transfer vectors and 
BacPak6 were called MlBAhis ((-) integrase) and MIKAhis (( + ) integrase), respectively. 
Introduction of the His codons led to increased activity of the encoded polypeptides in 
eukaryotes. as measured by SDS-poiyacrylamide gel electrophoretic analyses and RT 
assays (see below). As shown below, the his-tag additions increased the stability (perhaps 
by providing a DNA binding site), activity, polymerization capabilities and ease of 
purification of RTs such as MlBAhis. 

The 5' end of the MAV pol gene was also modified. Beyond deletion of the 5' 
non-coding sequence of pol (see the description of pHRT and pMBaclO above), the widely 
recognized Met initiation codon ("ATG") was introduced immediately upstream of the 
natural start codon (the Thr codon "ACT" at nucleotides 253-255 of SEQ ID NO: 1) of the 
MAV pol gene. 

In general, the above-described cloning strategy reflected efforts to eliminate the 
integrase domain of avian RT and thereby avoid the insolubility and lethality problems 
associated with that protein domain. Deletion of 192 bp from the 3' terminus of the full- 
length MAV-RT gene (SEQ ID NO:l) by terminating the coding region at the Kpn\ site 
(Table 1) produced the "MIKA" clone series. These clones coded for a f] polypeptide that 
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is smaller than the naturally occurring ft polypeptide. These clones exhibited enhanced RT 
expression and the expressed polypeptides exhibited enhanced activity levels (compare 
below, the expression of M 1-5,6 [ full-length | u> M 1 KA [p' polypeptide]). Larger deletions 
extending from the 3' end of the full-length MAV-RT gene were constructed using a 
convenient 5^/11 site to generate the M1BA clone series. These clones encoded an a 
subunit of RT that was larger than the naturally occurring a polypeptide. The M1BA 
clones exhibited increased expression and activity, in comparison to the expression and 
activity of full-length MAV-RT; moreover, M IBA was more soluble than naturally 
occurring MAV-RT. 

The invention also contemplates polynucleotides and polypeptides resulting from 
a recognition that some advantageous properties of the integrase, e.g., DNA binding and 
polymerization, could be re-introduced into avian RTs without re-introducing the 
deleterious (i.e., insolubility and lethality) characteristics of the avian RT integrase 
domain. One approach is to attach RT integrase domains or non-RT integrase domains 
known in the art to the (-) integrase polypeptides or attach the coding regions of these 
domains to the polynucleotides encoding these (-) integrase polypeptides. Another 
approach is to add amino acid lags to the (-) integrase RT polypeptides (or corresponding 
codons to (-) integrase polynucleotides) as disclosed herein. A preferred tag is a basic 
amino acid tag such as a His tag. As disclosed below, a His tag was attached at the C- 
terminus of an a polypeptide equivalent (MIBAhis). This clone exhibited relatively high 
levels of expression, activity and solubility. Thus, the invention provides avian RTs 
improved in terms of expression and activity levels, and in terms of solubility and ease of 
purification, while retaining the processivity and thermostability characteristic of avian 
RTs. 

Accordingly, the invention contemplates the construction of analogous 
polynucleotides and recombinant molecules encoding RT polypeptides of unnatural length 
from other sources, such as MMLV, HIV. RSV. ASLV, ATV, and others. Further, the 
invention extends to polynucleotides encoding these RTs of modified length, or full length 
RTs. provided that the polynucleotides additionally encode polymerizing or nucleic acid 
binding domains, and preferably both domains, at their 3' termini. Hxamples of 
polynucleotides encoding a non-avian RT of unnatural length are polynucleotides encoding 
an RT portion or fragment having the amino acid sequence set forth at any one of the 
following: positions 1-765 of SKQ ID NO:39 (an HIV 2 RT sequence), positions 1-800 



of SEQ II) NO:41 (an MMLV-RT sequence), and positions 1-625 of SF.Q ID NO:43 (an 
1 RT sequence). These polynucleotide sequences have some correspondence to the 
sequence o! the polynucleotide encoding the M A V-derived M1BA polypeptide and are 
expected to function in a manner analogous to polynucleotides encoding M1BA. Of 
5 course, a polynucleotide encoding the full-length p polypeptide of HIV-2 (SEQ ID 
NO. 38). or encoding equivalent polypeptides from MMLV or HIV-1 (SEQ ID NO:40 or 
SEQ ID NO:42, respectively), along with a 3' terminal sequence encoding a polymerizing 
and or nucleic acid binding domain, are also contemplated by the invention. 

With respect to polypeptides, the invention comprehends the polypeptides encoded 

10 h> the above-described polynucleotides, as well as polypeptides that have a C-terminal 
polymerizing and/or nucleic acid binding domain that has been added by means other than 
expression. For example, an RT polypeptide having a Cys residue or a His residue 
attached at the (Merminus by chemical condensation falls within the scope of the present 
invention. In addition, effective elimination of an integrase domain, such as is found in 

15 avian RTs. may be effected by altering a suitable coding region by inserting, deleting, or 
substituting (transitions and/or transversions), one or more nucleotides. Thus, the 
invention contemplates RT polypeptides that are the same length as naturally occurring RT 
polypeptides. These RT polypeptides may have the same amino acid sequence as naturally 
occurring RTs. provided that the RTs of the invention have a polymerizing and/or nucleic 

20 acid binding domain at their C-termini. Alternatively, RTs of the same length as natural 
RTs may have sequences that differ from the natural RTs, thereby effectively eliminating 
integrase activity. The RTs of the invention may also be shorter, or longer, than naturally 
occurring RT polypeptides. The shorter RT polypeptides of the invention eliminate some, 
or all, of the C-terminal sequence of a naturally occurring RT which, in the case of avian 

25 RTs. contains the integrase domain. RTs of the invention that are longer than naturally 
occurring RT polypeptides contain the sequence of that naturally occurring RT and, in 
addition, sequence of an adjacent peptide region. Additionally, these polypeptides of 
unnatural length may have a polymerizing and/or nucleic acid binding domain added at 
their (" termini. 
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Example 4 

1 he RT constructs described in Example 3 were transformed mm pmkarvotic and 
eukaryouc host cells and the expression of RT polypeptides was analyzed. A prokaryotic 
host cell, Escherichia col, DH5oF', was transformed with pHRT, pHBRT or pllKRT, 
using a technique standard in the art. Cells subjected to the transformation protocol were 
plated on LB plates ( 10 guyptone, 5 g yeast extract, 5 g NaCl, 1 ml of IN NaOH, 1.5g 
agar, ddH : 0 in a total volume of 1 liter) containing 50 ,xg/ml ampicillin for selection of 
transformed host cells. Single colonies were picked, expanded in small culture (i.e., 5 
ml), ep.somal DNAs were rapidly isolated from an aliquot of cells, and the purified DNAs 
were analyzed for the presence of a recombinant molecule of the expected size. 
Dideoxynuclcotide-based sequencing of these DNAs confirmed that the first ATG {i.e. , the 
initiation codon) was in-frame with the remainder of the RT coding region. 

Another aliquot of those small cultures containing cells transformed with pHRT. 
pHBRT, or pHKRT was used to inoculate flasks containing 10 ml of LB-ampicillin and 
grown at 30°C until an OD UX) of 0.6 was reached. Flasks containing these cells were then 
quickly shifted to 42°C to de-repress the AP R promoter and express the recombinant 
protein. After an hour at 42"C, cells were pelleted and analyzed for expression of protein 
by SDS-PAGE, Western blot analyses, and RT activity assays, as described below. 

In general, about 10% of the expressed protein was recovered in soluble form and 
90% of the expressed protein was found in inclusion bodies, as revealed by pelleting lysed 
cells at 12,000 x g for 5-15 minutes. RT activity was also found when expressing both the 
full-length and the deletion derivatives of the MAV pal coding region from other 
recombinant vectors, such as P TZ18U and phT21d, that contained similar insert fragments 
encoding full-length or C-terminally deleted MAV-RT. 

A eukaryotie host cell suitable for use in practicing the invention is the Sf9 insect 
cell. Several polynucleotides were separately introduced into Sf9 cells using the 
Baculoviral expression system. O'Reilly et at., in Baculovirus Expression Vectors: A 
Laboratory Manual, Oxford University Press (1994). The polynucleotides (i.e., 
pMBacRT, pBacMlBA. pBacMIBAhis. pBacMlKA. and pBacMIKAhis) were purified by 
the standard alkaline lysis method, as described in Sambrook el al. . ( 1989). The DNA was 
then centrifuged through a CHROMA SPIN +TK-400 column (Clontech Laborator.es, Inc. , 



WO 00/42199 PCT/USOO/00896 

- 26 - 

Palo Alio. CA.) at 500 \ » for 7 minutes in a swinging bucket rotor. (HN-SII centrifuge 
from ILC. Inc.) This purified DNA was then used to transform eukaryotic cells. 

STJ insect host cells were prepared for transformation using an established 
procedure. The Sf9 cells from an exponentially growing cell culture were initially counted 
5 using a hemocytometer and diluted to 5x10'' cells/ml of TNM-FH Insect Cell Medium 
{Product No. T-1032; Sigma Chemical Co.. St. Louis, MO.) with 10% fetal bovine serum 
(FBS) and antibiotics (50 units/ml nystatin, 50 units/ml penicillin, and 50 /.ig/ml of 
streptomycin). Subsequently, 1 .5 ml of this culture was added to each well of several 12- 
well tissue culture plates. The cells were allowed to attach to the plate for a period of 1 

10 hour. The medium covering the cells was then removed and 2 ml of TNM-FH medium 
without serum was added. The serum-free medium was swirled over the cells and again 
the medium was removed. This process was repeated one more time to remove all traces 
of fetal bovine serum (i.e., FBS) and antibiotics. The cells were then incubated in TNM- 
FH medium for 30 minutes while the transfection mixture was prepared. 

15 The 50 ^1 transfection mixture contained 500 ng of DNA, 500 ng of Bsu36\- 

digested BacPak6 viral DNA, and ddH 2 0. This mixture was gently mixed with 50 fx\ of 
transfection reagent (Clontech, Inc.) and incubated at room temperature for 15 minutes to 
allow the transfection reagent to form a complex with the DNA, as recommended by the 
supplier of the transfection reagent. 

20 Medium covering the Sf9 cells was removed and 300-500 /J of TNM-FH medium 

was added to each well. To this medium, the transfection rcagent-DNA mixture was added 
drop-wise while gently swirling the dish. The cells were then incubated at 27°C for 5 
hours before adding 2 ml of TNM-FH medium containing 10% FBS and the antibiotics 
identified above. DNA-cell contact was continued at 27°C for 60-72 hours. Medium from 

25 these plates was then collected and used as primary virus slocks. 

Primary virus stocks were subsequently subjected to plaque purifications by 
standard methods, as described in King el at., in The Baculovirus Expression System: A 
Laboratory Guide (eds. Chapman and Hall. N.Y. 1992), to produce clonal stocks. The 
clonal stocks were amplified using a 1 : 1 virus to insect cell ratio to produce large quantities 

30 of recombinant viruses. 

The viruses from the clonal slocks were used to infect insect cells and ultimately 
analyze RT expression in a eukaryolic environment. Based on the titer obtained from the 
plaque assays, an infection was set up using a ratio of 5 viruses per Sf9 cell. After 60 
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hours, the medium and cells were collected. The cells were pelleted, resuspended in cell 
lysis buffer (10 mM Tns HC1. pH 8.0. 50 mM NaCI. 5% glycerol. 0.5V? Triton X-100. 
and protease inhibitors (50 M »/ml Ben/am.dme 1IC1. 0. 1 mM 4-t 2-uminoeih> 1 >-benzene- 
sulfonyltluoride. and 1 M g/ml pepstatin A)) and lyscd by somcation. These samples were 
5 subsequently subjected to SDS-PAGE. Western blot analyses, and RT activity assays. 

Tor large-scale expression studies, Sf9 cells were initially grown in T25 tissue 
culture flasks under the conditions described above. Sf9 cells adhering to the T25 tissue 
culture flasks were gently dislodged and adapted to suspension cultures as described by 
King ei ai. 1992. These suspension cultures were expanded in spinner flasks to a volume 
10 of 1-3 liters. When the insect cells reached a density of 1x10" cells per ml. they were 
infected with a concentrated stock of recombinant viruses at a ratio of 5:1 viruses per insect 
cell. A variation of a standard protocol was used to infect these cells. A large volume of 
amplified viral stock (MIBA, M1KA, MlBAhis, and MIKAhis, or M 1-5,6) was 
concentrated using one-half volume of 40% PEG 8000 and one-sixth volume of 5 M NaCI. 
15 Precipitated viruses were collected at 12,000 x g for 30 minutes using a Sorvall RC5C 
centrifuge (Dupont, Newtown, CT). The pelleted viruses were resuspended in lx PBS (10 
mM K-P0 4 , pH 7.5, and 150 mM NaCI) at 1/20 of the culture volume and stored at - 
20°C. Before infection, the viruses were filtered through a 0.2 /x filter. 

After a 48 hour period of infection, 1 ml aliquots of infected cells were collected 
20 for RT assays to monitor RT expression levels. Cells were harvested at the peak of RT 
expression (generally around 60 hours post-infection), as determined from previous trials. 
Cells were pelleted at 5,000 x g for 30 minutes and stored at -80°C. 

Polypeptides expressed in insect cells were also characterized by SDS-PAGE and 
Western blot analyses. Results of a Western blot assay using a mixture of anti-RT 
25 monoclonal antibodies 1D8, 2E10, 6F1. 4C4, 9H10 and 9C2 are shown in Fig 1 (lane 1 
prestained molecular weight markers of 123 kDa, 90 kDa, 64 kDa, 50 kDa, and 38 kDa, lane 
2 native AMV-RT(nRT) ( lane 2), lane 3 MlBAhis, and lane 4: MIKAhis Further analysis 
of the antigenic properties of MlBAhis and native RT revealed that monoclonal antibody 
6H recognized native RT but failed to recognize the MlBAhis polypeptide. Thus, at least 
30 one epitope found on native RT is not found on MlBAhis, indicative of structural 
differences between the proteins. 

The results further indicate that both MIBA and M1KA expressed ten-fold more 
RT than M 1-5,6, which encodes full-length RT. When cell pellets were assayed for RNA- 
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dependent l)N A polymerase activity, M1BA was expressed at 10.000 units per liter of 
insect cell culture, whereas M1KA and Ml-5, 6 were each expressed at 1.000 units per 
liter of insect cell culture. Though M1KA expressed as well as M1BA when analysed on 
Western blots, active M 1 KA recovered from the cell pellet was ten-fold less than M1BA. 
Most of the expressed Ml KA remained insoluble in the pellet. Although the corresponding 
his-tagged proteins (MIBAhis and MlKAhis) were expressed at levels similar to their 
M1BA and M1KA counterparts as revealed by Western blotting, the activities of the his- 
tagged proteins were higher. M lKAhis was expressed at 2.000 units per liter of insect cell 
culture and MIBAhis was expressed at 200.000-400,000 units per liter of insect cell 
culture. 

The Baculoviral system is preferred for expression of RT and fragments thereof. 
A relative comparison of RT expression in prokaryotic and eukaryotic cells, as measured 
by reverse transcriptase assays of purified recombinant and crude protein, revealed that 
His-tagged RT polypeptides from eukaryotic insect cells were most active and stable, while 
untagged polypeptides expressed in prokaryotic cells were less active and stable. 

Recombinantly expressed polypeptides of the invention were purified using 
conventional protocols, with metal-affinity chromatography included for the isolation of 
His-tagged polypeptides. Host cells containing recombinant molecules (i.e. M 1-5,6. 
M1BA. M1KA, MIBAhis and MlKAhis) encoding an RT or fragment thereof were 
centrifuged and the cell pellet was solubilized in 20 ml cell lysis buffer (20 mM Tris MCI, 
pH 8.0, 150 mM NaCl, 0.59, r Triton X, and 5% glycerol) per gram of cell pellet. The 
resuspended cells were sonicated with five 30-second bursts at 50% power on ice with 30 
seconds ol cooling between each round of sonication. Sonicated cells were then stirred at 
a low speed on a magnetic stirrer at 4°C for one hour to complete cell lysis. The lysed 
samples were centrifuged at 12.000 x i> for 30 minutes. The pellet was discarded and the 
supernatant was subjected to column chromatography. 

RTs lacking his tags were purified according to conventional protocols, which 
included removal of cell debris by centrifugation and subjection of supcrnatants to 
chromatographic purification procedures known in the art. The soluble extract containing 
his-tagged RTs were mixed with a commercially available Ni ' " affinity column (Ni-NTA 
resin from Qiagen, inc., Chatsworth, CA), thereby using the his tags for their known 
purpose of facilitating purification via metal affinity chromatography. The extract and 
affinity resin were gently rocked on ice for 1 hour in a 50 ml plastic test tube. The resin 



was then packed in a column and washed with two column volumes of wash buffer (20 
m M 'I ns MCI. P H K.O. 250 mM NaCi. 0.5% Triton X-100, and 5% glycerol) and two 
column \olLiinc- oi hultci A (20 mM Iris-H( 1, pH S.O. 250 mM NaCl. 0.5% Triton X- 
100. 5';.' glycerol and 50 mM imidazole). (Of course, the extract could have been applied 
5 to a pre-fonned affinity column and purified using conventional column chromatography, 
as would be understood in the art.) The protein bound to the column was eluted by setting 
up a linear gradient from buffer A to buffer B (20 mM Tris-HCl. pH 8.0, 250 mM NaCl, 
0.5% Triton \ K>(). 5'.' glycerol and 250 mM imidazole). 

Fractions Irom the nickel affinity column that had RT activity were analyzed by 

10 SDS PAGfi to determine the purity of the protein, as shown in Tig. 2. Fig 2 presents an 
electrophorctogram of an X° 0 SDS-PAGF. gel stained with Coomassie Blue The lanes of the 
gel shown in Fig 2 contain molecular weight markers of 94 kDa, 64 kDa, 43 kDa, 30 kDa 
and 20 kDa (lane i ) and aliquots effractions obtained from the nickel affinity column ( lanes 
2 to 4) The fractions that were greater than 95% pure were pooled and dialyzed against 

15 storage buffer (200 mM KIM. pll 7.2, 5 mM DTT, 0.2% Triton X-100 and 50% glycerol). 
Additionally, conventional purification steps may be incorporated into the protocol to 
achieve greater purity, as would be understood in the art. 

Protein concentrations were determined using the Bradford protein assay (BioRad 
Laboratories, Inc., Hercules. CA). Generally, the specific activity of rRT (MIBAhis) was 

20 calculated to be approximately 30.000-100,000 units/mg, which is similar to the specific 
activity of nRT (30-100.000 units/mg). 

Example 5 

The purified rRT prepared from cultures expressing MIBAhis at 400,000 units/liter 
of culture, a level w ell beyond a commercially feasible production limit, was found to be 

25 greater than 95% pure as judged by electrophoretic fractionation using 10% SDS-PAGE. 
The apparent molecular weight of the monomer is 60 kDa, which compares well with the 
calculated molecular weight of approximately 59.5 kDa. The recombinant protein was 
analyzed on a 12.5% polyacryiamide non-denaturing gel for the presence of monomers 
and polymers (<•.#. , dimers) using the Pharmacia Phast System. The protein sample was 

30 prepared in either of two ways. One aliquot was completely denatured by heating at 
100°C for three minutes in treatment buffer (0.125 mM Tris-HCl, pH 6.5, 4%: SDS, 20% 
glycerol, 10% P-mercaptocthanol). Another aliquot was partially denatured at 70°C in 
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treatment buffer without 2-mcrcaptoethanol. Under completely denaturing conditions. rRT 
was observed to migrate at approximately 66 kDa CBS A marker) and the partially denatured 
.samples had additional bands ranging Horn 60-200 kDa. indicating that rRT formed 
polymers. Protein size determinations were confirmed using molecular sieve 
5 chromatography with Supcrose 121 1R 10/30 (separation range of 1-300 kDa). as described 
above, which revealed that the majority of the rRT eluted between beta amylase 
(approximately 200 kDa) and apoferritin (443 kDa). Thus, the rRT was predominantly in 
a polymeric form. Without wishing to be bound by theory, the addition of C-terminal 
histidine residues may have provided a polymerization capacity, perhaps by complex 
10 formation via metal (e.g., nickel) chelation, to substitute for the loss of that capacity 
attributable to the integrase domain, which had been deleted. Thus, the invention 
contemplates RT polypeptides having C-terminal attachments in the form of compounds 
capable of promoting polymer formation. Suitable compounds would include, but are not 
limited to, a plurality of basic or acidic amino acids, as well as Cys residues capable of 
15 disulfide bond formation. 

Expressed rRT was also characterized immunochemical^. Monoclonal antibodies 
against AMV reverse transcriptase were prepared using techniques well known in the art. 
See Harlow el al. Antibodies: A Laboratory Manual, Cold Spring Harbor Laboratory, 
Cold Spring Harbor. New York (1988). Briefly, spleen cells from a mouse that had been 
20 immunized with RT were fused with mouse myeloma cells to make hybridomas. These 
hybridomas were allowed to grow into colonies in 96-well plates; supematants from these 
wells were then tested to find hybridomas that appeared to make anti-RT antibodies. 
Further testing confirmed these results. 

To prepare spleen cells for hybridoma production, a BALB/C mouse (female, ten 
25 weeks old. obtained from Harlan Spraguc Dawley. Madison, Wl) was immunized by 
several intraperitoneal injections with AMV-RT (Molecular Biology Resources, Inc.) using 
a conventional immunization schedule. To prepare RT for injection, the storage buffer was 
removed from purified RT by diluting the enzyme in phosphate-buffered saline (BBS) and 
reconcentrating it using a Centricon 30 concentrator (Amicon Corp.). The concentrated 
30 RT was then diluted again in PBS and emulsified with an equal volume of an adjuvant. 
For the initial injection, the adjuvant was complete Freund's adjuvant (Sigma Chemical 
Co.); for the booster injections, the Ribi Adjuvant System (Ribi Immunochem Research, 
Inc. Hamilton, VT) was used. The dose of RT was approximately 20 micrograms per 



WO 00/42 199 ^ PCT/US00/00896 

injection. The injections were made over a period of eight months, with successive 
intervals of five weeks, four weeks, three weeks, eleven weeks, eight weeks, and three 
weeks. The fusion was pertornied five days after the final boost. 

For the fusion experiment, the mouse was sacrificed and spleen cells w ere isolated 
5 and fused with myeloma cells (P3X63-AG8.653. ATCC CRL 1580). using procedures well 
known in the art. See Harlow el al. In particular, the cells were fused in 50% 
polyethylene glycol, resuspended in a selection medium (i.e., HAT medium), and 
distributed into the welis of fourteen 96-well plates. After three weeks of growth, 
approximately 350 wells contained hybridoma colonies. 

10 Hybridomas making anti-RT antibodies were identified by ELISA For this 

procedure, the wells of 96-well polystyrene ELISA plates were first coated w ith purified 
RT (2 micrograms RT/ml in 100 mM Tris-HCi, pH 8.5, 0.05% NaN,: overnight 
incubation al room temperature), then washed with TBST (Tris-buffered saline, pll 8.5, 
0.05% Triton X-100) to remove excess RT. For the assay itself, the wells were filled with 

15 95 microliters of TBST plus 5 microliters of hybridoma culture supernatant. The plates 
were incubated at room temperature for two hours, then washed with TBST to remove 
unbound immunoglobulin. To detect wells with anti-RT antibodies, peroxidase-conjugated 
goat anti-mouse IgG (heavy-chain specific; Jackson ImmunoResearch, West Grove, PA) 
was diluted 5,000-fold into TBST and added to the wells of the ELISA plates. After the 

20 wells had been incubated for one hour at room temperature, the unbound peroxidase 
conjugate was removed by thorough washing of the plates with TBST. Wells positive for 
RT were visualized colorimetrically following addition of the substrate 3-methyl-2- 
bcnzothiazolinone hydrazone/3-dimethylaminobcnzoic acid/hydrogen peroxide to detect 
immobilized HRP. Hybridomas from positive wells were repeatedly cloned by limiting 

25 dilution until all wells with growth were ELISA-positive. 

Supernatants from wells that tested positive by ELISA were further screened by 
immunoprecipitation of RT using techniques well known in the art. Sec Harlow er a!. The 
irmnunoprecipitaiion assay relies on the presence of protein A (which binds IgG) on the 
surface of Slaphxlococcus aureus cells (SAC. Sigma Chemical Co.). Since protein A does 

30 not bind strongly lo mouse IgG, a pellet of centrifuged SAC cells was first treated with 
rabbit anti-mouse IgG antibodies. The pellet from 10 microliters of a 10% suspension of 
these cells was then incubated with 50 microliters of hybridoma culture supernatant for 2 
hours at room temperature. The resultant SAC cells were centrifuged. washed, and 
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resuspended in diluted RT. The R T cell suspensions were incubated for 3 hours at 4"C 
and centi ifugcd 'Hie resultaiu supernatants were removed and tested tor depletion of RT 
acuvm using a standaid radiochemical assa> . 

Si\ hybridoma lines tested positive in both the ELISA and iimnunoprecipitation 
5 assays. These lines were designated 1D8, 2E10, 4C4, 6F1, 9C2 and 9H10. All six 
monoclonal antibodies had gamma- 1 , kappa isotypes. 

The form of active rRT (i.e., monomer or polymer.) was confirmed using ELISA 
in a sandwich formal with anti-RT monoclonal antibodies. Initially, monoclonal antibody 
was immobilized in DNA hind plates. Costar Corp., Cambridge MA. The plate was then 

10 blocked with HSA lo prevent non-specific binding. The wells were then incubated with 
purified rRT (i.e.. MlBAhis). Excess or unbound protein was removed by washing with 
phosphate-buffered saline. The wells were then incubated with the same monoclonal 
antibody linked to biolin lor detection. If the rRT existed as a monomer, the biotin-linked 
monoclonal antibody should not bind to it. However, the biotin-linked monoclonal 

15 antibody did bind to the rR T. indicating that the rRT had formed a polymer. 

To determine the purity of the samples containing reverse transcriptase, 
recombinant protein expressed from each of a variety of clones (e.g., MlBAhis) and found 
in either the solubili/.cd cell pellets or protein fractions from the different chromatographic 
columns used in purification were subjected to SDS-PAGE. Samples were clectrophoresed 

20 on 8% polyacrylamide gels containing 67c stacking gels, followed by Coomassie Blue R- 
250 staining using standard protocols (Sambrook et al., 1989). The recombinant protein 
was found to be greater than 95% pure. 

Using the Pharmacia Phast System, the recombinant (MlBAhis) and native reverse 
transcriptase, as well as appropriate standards supplied with the system (i.e., IEF 3-9), 

25 were subjected to isoelectric focusing electrophoresis. (Pharmacia-Upjohn, Piscataway, 
NJ .) The experimentally determined pi values of the rRT and rRT were 6.0. The 
theoretical pi of rRT, calculated from its amino acid sequence, was 6.8. 

For analyses of total expression, host cells containing one of several recombinant 
DNAs (i.e., pMBacRT, pBacMIBA, pBacMIKA, pBacMIBAhis. and pBacMIKAhis) were 

30 induced to express recombinant protein. The induced cells were pelleted at 12,000 x g for 
5 minutes. The cell pellet was then resuspended in SDS sample buffer (Sambrook et al., 
1989) or cell rcsuspension buffer (20 mM Tris JIC1, pIT 8.0, 250 mM NaCl, 0.5% Triton 
X-100. and 5% glycerol) to assess the solubility of the protein. Resuspended cells were 
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pulse-son,catcd three times at a setting of 3 (Virsomc 100 from Virus Company. Inc.. 
(la.dincr. NY ) for 20 seconds each (500 mM Ins HC1. pi I 6.5. 14% SUS. 30% glycerol. 
<j.v, DTI. and 0.012';, hmmophcnol blue). Small aliquols of the samples in SDS sample 
holler were loaded on duplicate gels and elcctrophoresed. One of the duplicate gels was 
sunned with Coomassie Blue and the other gel was used to transfer protein to a 0.2 \i 
nitrocellulose membrane using a Bio-Rad transfer apparatus for Western blot analysis. Bio 
Rad Laboratories. Inc., Hercules. CA. Detection of expressed protein in fractionated 
crude lysates uas possible using specific, monoclonal anti-RT antibodies (a mixture of 
monoclonal anybodies 4C4, 1D8, 2E10. 6F1, 9H10, and 9C2; Molecular Biology 
Resources, Inc., Milwaukee, WI) to detect the recombinant protein. 

In practice, the nitrocellulose membrane containing the transferred protein was 
contacted with a blocking buffer (5% casein hydrolysate, 150 mM NaCl, 10 mM Tris HC1, 
P H 8.0) tor 30 minutes followed by incubation with a 1:1000 dilution of anti-RT 
monoclonal antibody in blocking buffer. After overnight incubation, blots were rinsed in 
wash buffer ( 10 mM Tris HC1, pH 8.0, 150 mM NaCl, 0.5% Tween 20) and incubated 
with a 1:5000 dilution of alkaline phosphatase-conjugated goat anti-mouse antibody in 
blocking buffer for 1 hour. Subsequently, the blots were rinsed 3x with wash buffer and 
Ix with AP buffer (100 mM Tris HCl, pll 9.5, 5 mM MgCU and 100 mM NaCl). RT was 
indirectly delected by performing a colorimetric phosphatase assay using a standard 
substrate mixture of NBT (nitroblue tetrazolium; 75 mg/ml in dimethylformamide) and 
HC1P <5-bromo-4-chloro-3-indolyl phosphate, 50 mg/ml in dimethylformamide), which 
tonus a blue precipitate when dephosphorylated by any immunologically immobilized 
phosphatase. The anti-RT antibody recognized two bands, one at approximately 61 kDa 
and one ai 92 kDa. in the lane containing native RT. In the lane containing recombinant. 
Ilis-lagged RT expressed from MlBAhis (alpha fragment equivalent), a single band at 
approximate^ (>0 kDa was found; in the lane containing recombinant, Ilis-tagged RI 
expressed from M lKAhis (beta fragment equivalent) a single 91 kDa band was found. 

Assays were also performed to determine the intrinsic/extrinsic exonuclease. 
endonuelease. (i.e., nicking) DNase, and RNase activities of the rRT. An assay for 3'- 
>5' exonuclease activity was performed using radiolabeled Taq\ fragments of lambda 
DNA as a substrate. The 3' ends of 7^/1-digcsted lambda DNA fragments (265 M g) were 
labeled with 60 M Ci I'Hj-dCTP (57.4 M Ci/mmole) and 60 M Ci |Tl|-dGTP (8.9 fiCi) using 
40 units of exo Klenow fragment of DNA polymerase in a standard labeling reaction. 
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Sambrook er al., (1989). The 3'-->5' exonuclease assay was performed in a final volume 
of 10 fx\ containing 50 niM Tns HC1, P H 7.6. 10 mM MgCK, 1 n,M DTT. 0.015 M g of 
labeled Tacj\ fragments ol a DNA. and cither 2.5 or 10 units of R'l enzyme One unit ot 
RT enzyme is the amount of enzyme required to incorporate 1 nmol of dTTP into an acid- 
insoluble form in 10 minutes at 37°C under the stated assay conditions (see. Example 6). 
Each sample was incubated at 37° C for 1 hour. The reaction was terminated by the 
addition of 50 /xl of yeast tRN A and 200 /J of 10% trichloroacetic acid. After incubation 
for 10 minutes on ice, the samples were centrifuged for 7 minutes in a microcentrifuge. 
The supernatant (200 M l). which contained the released label, was removed and added to 
6 ml of scintillation fluid and counted in a scintillation counter. The results showed that 
the rRT released 0.13% of the label, an acceptably low level of 3'-- > 5'exonuclcase 
activity. 

The rRT was also subjected to a 5'-->3' exonuclease assay, using radiolabeled 
HaelU fragments of ADNA. The X fragments were 5' end-labeled using 60 fxC'i Iy-"PJ 
dATP (2,000 Ci/mmol) and 40 units of T4 polynucleotide kinase in a conventional 
procedure. Sambrook ei al., (1989). Except for the use of 5' end-labeled HaelU 
fragments as substrate, this assay was performed in accordance with the description of the 
3'-->5' exonuclease assay above. The purified rRT released - 0.36% of the label into an 
acid-soluble form, an acceptably low level of 5'->3' exonuclease activity. 

Double-stranded and single-stranded DNasc assays were also performed using the 
protocol for the 3'-> 5' exonuclease assay, again with the exception of the type of labeled 
substrate being used. For each of the DNase assays, intact lambda DNA (0.5 M g) was 
labeled with 30 M Ci |a-'T| dATP (2,000 Ci/mmol) using the random primer extension 
technique understood in the art. Each assay used 0.015 ^g of labeled X DNA. Tor single- 
stranded DNase assays, the labeled ADNA fragments were further subjected to heat 
denaturation (3 minutes at 100°C followed immediately by chilling on ice) to prepare the 
substrate. Again with the exception of the type of substrate employed, each of the DNase 
assays were conducted as described above in the context of the 3'->5' exonuclease assay. 
The rRT released 0.5% of the label in the double-stranded DNase assay: 0.02% of the 
label was released in the single-stranded DNase assay. Both results indicate acceptably low 
levels of DNase activities. The purified rRT was also subjected to an endonuclease, or 
nicking, assay by examining the extent to which the rRT converted a supercoiled substrate 
in the form of pBR322 to a relaxed form, as visualized by agarose gel electrophoretic 
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fractionation. The assay for endonuclease activity was performed in a final volume of 10 
Ml containing 50 mM Tns HC1, pH 7.6. 10 mM MgCT, 1 mM R-mercapioethanol. 0.5 M g 
pBR322. and 2.5. 5 or 10 units ot enzyme. Lach sample was incubated at 37' C lor 1 
hour. Two microliters of 0.25% hromophenol blue, 1 mM EDTA and 40 'A sucrose were 
5 added to stop the reaction. After a brief centrifugation. 6 ^1 of the sample were 
electrophoresed on a 1.0% agarose gel in IX TBE. Sambrook el al. . (1989). The results 
showed that less than 10% of the supercoiled substrate was converted to a relaxed form, 
an acceptably low level of nicking activity. 

The rRT was also characterized in terms of its RNase activity. In particular, this 

10 assay was designed to measure general RNase activity and, specifically, not an RNase H 
activity. Substrate was prepared using run-off transcription from a T7 promoter in the 
presence of [a-"P] UTP. In particular, the plasmid pPV2 (a pTZ-based vector containing 
a CoIEl ori; an ampicillin selectable marker; T7, T3 and lac promoters; and a 695 bp 
insert from plum pox virus) was linearized with Pvull. The run-off transcription reaction 

15 was performed with 1 ^g of linearized pPV2, 30 fid of |a-"P| dATP (2,000 Ci/mmol), 
and 10 units of T7 RNA polymerase using a conventional procedure. The RNase assay 
was then performed in the presence of single-stranded RNA substrate (0.15 fig,) and rRT 
(2.5, 5 or 10 units). Released label was again recovered as acid-soluble material using the 
TCA precipitation procedure described above. Scintillation counting showed that 1 % of 

20 the radiolabel was released, indicating an acceptably low level of RNase activity. 

Example 6 

The RNA-dependent DNA polymerase activities of native RT and recombinant RT 
(purified expression product of MlBAhis) were compared. One unit of enzyme was 
compared in RT assays with either poly rA:dT,,. 18 (20: 1) or mRNA as substrate. Product 

25 quantity was determined by either glass filter precipitation or binding to DE52 filters; 
product quality was monitored by autoradiography of a 1.2% TBE agarose gel containing 
fractionated reaction products. 

The reverse transcriptase activities of the native and recombinant proteins were 
compared using a modification of a procedure described by Meyers ci al.. Biochemistry 

30 30:7661-7666 (1991). The reaction mixture contained Ix reaction buffer ( 50 mM Tris- 
1 IC'l, pH 8.3, 40 mM KC1, 10 mM MgCK), 1 mM DTT, 0.4 mM poly rA:dT, s , 0.5 fiC.\ 
\a- v -P] TTP (3.000 Ci/mmol). 0.5 mM dTTP. 1 unit of enzyme (one unit of RT enzyme 
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is i lie amount of enzyme required to incorporate 1 nmol of dTIT into an acid-insoluble 
lorin m 10 minuter at 37' C under the stated assay conditions), and ddlEO to 50 id total 
w.iumc Reaction mixtures without enzyme were pre incubated at 37 C tor 1 minute prior 
io the addition of enzyme Reactions were then incubated at 37°C for 20 minutes, and 
terminated by adding 2 M l of 0.5 M EDTA followed by applying 40 fi\ of each reaction 
mixture to separate DE52 filter membranes. The filters were washed three times with 5% 
Na.lIPO, for 5 minutes each, then rinsed with ddH 2 0 followed by 95% ethanol. The 
filters were air dried, placed in scintillation fluid, and immobilized radioactivity was 
quant itated. 

A variation of the filter assay was used to compare the quantity and quality of 
reaction products. Messenger RNA, 891 bp control and 7.5 kb mRNA, were obtained 
troin GIBCO BRL, Gaiihersburg, MO. The following substitutions in the assay described 
above were made: 1 /ig of mRNA primed with 0.5 mM oligo dT ]; i8 primer, instead of 
poly rA:dT ls ; and mixed dNTPs (0.5 M M each of dGTP, dATP, TTP and dCTP, and 0.02 
,/Ci [a- 3J P]-dATP (6,000 Ci/mmol)), instead of [a- 32 P] dTTP. Reactions were initiated 
by adding 5 units of RT to the reaction mixture. After 1 hour of incubation at 37 °C, a 20 
f i\ sample was removed and mixed with 5 /d of stop solution (95% de-ionized formamide, 
10 mM EDTA, 0.05% xylene cyanol FF, and 0.05% bromophenol blue) and loaded onto 
a 1.2% TBE agarose gel along with a 1 kb ladder of standards (Chimerx, Madison, WI). 
Ciel samples were electrophoresed at 100 volts for approximately 2 hours and dried. Dried 
gels were autoradiography at -70°C for 3 days and developed to visualize bands. The 
results are presented in Fig. 3. which presents autoradiographic data showing size- 
fractionaied reverse transcriptase products using poly A-tailed mRNA as a template and oligo 
dT,„ primers In particular, the template was 891 nucleotides (lanes 2 and 4) or 7,500 
nucleotides (lanes 1 and 3), nRT was used in reactions analyzed in lanes 3 and 4. while rRT 
was used in reactions analyzed in lanes I and 2 Both the native and recombinant RTs 
produced products of 891 bp and 7.5 kb, depending on the size of the mRNA template. 

Example 7 

The properties of native MAV-RT and recombinant RT were compared. In 
particular, optima for temperature, pi I, magnesium ion concentration, and other divalent 
cation (i.e., calcium, copper, manganese and zinc) concentrations were determined, 
a) Temperature optima 
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The RNA-dependent ON A polymerase activity of native MAV-RT and recombinant 
MAV-RT (i.e.. MlBAhis) were compared in RT assays conducted at different 
temperatures. 

The relative RT activities of the enzymes were compared between 37°C and 7()°C 
at pH 8.0. The activity assays were performed in a 50 A d reaction mixture, containing 50 
mM Tris HC1, pH 8, 40 mM KC1, 10 mM MgCk. 1.34% trehalose, 2% maltitol, 1 mM 
DTT, 0.5 mM poly rA:dT ls . 0.5 mCi [^Hj-dTTP (70-90 Ci/mmol), 0.5 mM dTTP, 5 U 
enzyme (rRT or nRT), and ddH.O. Duplicate reactions were incubated at each 
temperature for 10 minutes. Products were quantitated by determining the lTI]-dTTP 
incorporated using a scintillation counter. The results arc presented as counts per minute 
as a function of temperature in degrees Celsius, as shown in Tig. 4A (black: rRT (i.e., 
MlBAhis); hatched: nRT). These results reveal that the optimum temperature for both 
nRT and rRT in RT assays was 55 °C. 

The temperature profiles of nRT and rRT (i.e., MlBAhis) in RAMP assays were 
also determined. RAMP reactions were conducted as described in PCT/US97/04 170, 
incorporated herein by reference in its entirety In particular, the target nucleic acid being 
amplified was Cryptosporidium mRNA from one oocyte As described in detail in Example 
8, this mRNA target was reverse transcribed into cDNA at different temperatures using 20 
units of native RT or 15 units of recombinant RT The results are presented absorbance at 
450 nm as a function of temperature in degrees Celsius, as shown in Fig. 4C (hatched line: 
standard; closed circles: rRT (i.e., MlBAhis)). The actual absorbance values at the 
various temperatures arc shown below the figure (upper row: standard; lower row: rRT). 
b) pH optima 

The relative RT activities of nRT and rRT in reactions at various pH values were 
also compared. Two sets of comparative reactions were designed: one set incubated at a 
conventional temperature of 37°C. the other set incubated at a 60"C temperature suitable 
only for thermostable enzymes. 

The pH values of selected buffers were adjusted at room temperature. Activity 
assays were performed in a 50 fi\ reaction mixture, containing 40 mM KCT, 10 mM 
i MgCK, 1 mM DTT, 0.5 mM poly rA:dT lh . 0.5 mCi f H]-dTTP (70-90 Ci/mmol). 0.5 mM 
dTTP, 5 units enzyme, ddHX), and 50 mM Tris-MCl (pi I 6, 7, 8. 8.3, 9, or 9.5). 
Reactions were incubated at 37°C or 60°C for 10 minutes. Products were quantitated by 
determining the flll-dTTP incorporated as counts per minute using a scintillation counter. 
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with the activities of nRT and rRT (i.e.. M 1 BAhis) under the various pH conditions being 
shown in Figs. 4D and 4E (black: rRT (i.e., M 1 BAhis): gray-hatched: nRT). The data 
in the figures establish that the optimum pi I lor nR'l and rRT is pH 8. 
t ) Mg ++ ion optima 

The RT assay described in Example 7(b) was modified to determine the influence 
of MgCl, concentration on the activities of the native and recombinant RTs. The reaction 
buffer contained 50 mM Tris-HCl, pH 8.3, and MgCU ranging in concentration from 0-100 
mM; all other reaction components were as described in Example 7(b). The reactions were 
incubated at 37°C. Incorporated [ 3 H]-dTTP was measured by scintillation counting, with 
the results presented as counts per minute. The optimum MgCl, concentration was found 
to be 5 mM for both nRT and rRT, as shown in Fig. 4F (black: rRT (i.e., M 1 BAhis); 
gray-hatched: nRT). 

d) Other divalent cation requirements 

The reaction described above in the context of determining Mg" concentration 
optima was modified to determine the influence of different divalent cations on RT 
activity. The reaction buffer included 50 mM Tris-HCl, P H 8.3, and 10 mM of the 
chloride salt of a divalent cation (MgCU, CuCU. MnCl 2 , ZnCF, or CaCl 2 ). Independent 
experiments were performed and a curve was constructed. Fig. 4G shows the activities of 
the enzymes as counts incorporated as a function of the cation used in the reaction (black: 
rRT (i.e.. M 1 BAhis); gray-hatched: nRT). As shown in Fig. 4G, maximal activity of both 
nRT and rRT (i.e.. Ml BAhis) was achieved using magnesium as the divalent cation. 

Example 8 

Conceptually, RT-PCR consists of a pre-amplification reaction followed by an 
amplification reaction. The pre-amplification reaction involves the use of reverse 
transcriptase to synthesize the first strand of cDNA using a CAT (i.e.. chloramphenicol 
acetyltransfcrase) mRNA as template. The CAT mRNA was provided in the Superscript 
kit from GIBCO-BRL. and the reaction was performed according to the supplier's 
recommendations. Following this reaction, the RNA from the RNA-DNA hybrid was 
removed by RNase II to free the first strand for use as a template in a Polymerase Chain 
Reaction (PCR). 

The pre-amplification reaction mixture initially consisted of 50 ng of control mRNA 
(i.e., CAT mRNA), 500 ng of oligo dT,,., 8 , and ddH,0 to bring the mixture to a total 
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volume of 12 ^1. This mixture was incubated at 70' J C for 1 minute. Subsequently, 2 M l 
of lOx PCR butler, 2 A d of 25 mM MgCF, 1 M l of dNTP from a combined stock solution 
containing 10 mM each of dGTP, d ATP. T I P and dCTP, and 2 A d of 0. 1 M DTT were 
added to the mRNA/oligo dT mixture. One set of reactions was incubated with 20 U of 
nRT and the other set of reactions was incubated with 20 U of rRT (i.e. , M 1 BAhis) . One 
tube from each set was incubated at one of several temperatures and each reaction 
proceeded for one hour. The reactions were terminated by incubation at 90°C for 2 
minutes. Reactions were then cooled on ice and 1 M l of RNase II was added to each tube 
and incubated at 37 °C for 20 minutes. 

For the amplification reactions, each reaction mixture was assembled in a thin-wall 
tube containing: 5 M l of lOx PCR buffer, 3 M l of 25 mM MgCI 2 , 1 M l of dNTP from a 
combined stock solut.on containing 10 mM each of dGTP, dATP, TTP and dCTP, 1 M l 
each of 10 nM amplification primer 1 and 10 M M amplification primer 2 as supplied in the 
superscript kit, 1 /J of Taq DNA polymerase and Pyrococcus woesii (i.e.. Pwo) DNA 
polymerase mix, (Boehnnger Mannheim Corp., Indianapolis, IN) 2 M l of the cDNA 
mixture from the first-strand synthesis reaction and ddHX) to 50 fi\ total volume. Reaction 
products were analyzed by subjecting 5 /xl of the reaction to fractionation on a 1 .2% TBE 
agarose gel and determining the intensity of the bands, in ng of DNA. using an imager 
equipped with a DC40 camera and Kodak Digital Sciences ID™ software. The quantity 
of DNA synthesized by rRT was comparable to the quantity synthesized by nRT. 

The results showed that the temperature optimum for RT-PCR was 60°C using 
either nRT or rRT, as shown in Fig. 4B (results are presented as ng of PCR products 
produced as a function of temperature in degrees Celsius, with open squares indicating rRT 
(i.e., M 1 BAhis) and solid squares indicating nRT). The quantity of gene-specific products 
was greater at 60°C than at 37"C. The optimum temperature lor RNA-dependent DNA 
polymerase activity for both nRT and rRT was 55°C (see, Kxample 7a and Fig. 4A). The 
differences in temperature optima are probably due to the need for both DNA-dependent 
DNA polymerase and RNase H activities (having different temperature optima) in RT- 
PCR. 

Rapid Amplification (i.e., RAMP) is an amplification technique disclosed in 
International Application Serial No. PCT/US97/04170. A RAMP reaction was also 
performed using an RT according to the invention and a first strand of cDNA from a 
Cryptosporidium oocyte mRNA as a template, along with a nicking enzyme (i.e.. 
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Rs/IIKCT) and Bst DNA polymerase. The Bst DNA polymerase provided both 
polynucleotide synthesis activity and strand displacing activity. 

The reaction consoled of 35 m.M k»PO.,. 0.7 mM TTis-HCl. pi I 7.9. 1.4 mM 
dCTP. 0.5 mM each of dATP, dGTP and dTTP, 35 mg of Bovine Serum Albumin, 10.2 
mM MgCK, 3.4 mM KC1, 0.7 mM DTT, 2% Maltitol, 1.34% Trehalose, 0.5 mM of 
Amplification Primer 1 (5'-ACCCCATCCAATGCATGTCTCGGGTCGTAGTCT- 
TAACCAT-3'; SEQ ID NO:31) and Amplification Primer 2, (5'-CGATTCCGCTC- 
CAGACTTCTCGGGTGCTGAAGGAGTAAGG-3'; SEQ ID NO:32) and 1% glycerol. 
To each reaction, 15 units of rRT (i.e., MlBAhis) or 20 units of nRT were added, along 
with 36 units of Bst DNA polymerase and 250 units of fi.v/HKCI in a total volume of 10 

The amount of product synthesized in each reaction was measured hy a plate assay. 
The plate assay consisted of a gene-specific capture primer (5'- 
AAACTATGCCAACTAGAGATTGGAGGTTGTTT-3'; SEQ ID NO:30) bound to the 
wells of a microliter plate and used to capture the product. The captured product was then 
detected by an oligonucleotide (HRP-conjugated P2 Comp: SEQ ID NO: 37) linked to 
Horse Radish Peroxidase. The amount of bound HRP was detected by a colorimetric assay 
standard in the art. 

The amount of product synthesized by the rRT was two-fold more than the quantity 
synthesized by nRT between temperatures of 55°C to 64°C, as shown in Fig. 4C. The 
difference in temperature optima between the RT assays and the amplifications may be due, 
in part, to the differences in the relative RNase H activities at the assessed temperatures. 
The lowest RNase H activity was seen between 60"-65°C. temperatures that also produced 
longer cDNA products and greater amplification of templates. The temperature profile of 
the RNase II activity of rRT is shown in Fig. 6B. 

Example 9 

In addition to RNA-dependenl DNA polymerase activity, MA V-RT has additional 
enzyme activities, such as DN A -dependent DNA polymerase activity. The DNA- 
dependent DNA polymerase activity was investigated using a single-stranded M13mpl8 
DNA template and a sequence-specific 1y 32 P1 labeled primer (i.e.. Forward Sequencing 
Primer or FSP; 5 '-CGCCAGGGTTTTCCC AGTC ACGA-3 ' ; SEQ ID NO:29). The 10 
fil reaction mixture contained 50 mM Tris HC1, pll 8.3, 40 mM KC1, 10 mM MgCK, 20 
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/uM Drench conventional dNTP, 0.24 pmol of sequence-specific primer FSP and 800 ng 

01 snmle-suaiided M13mpl8 DNA template. Four units of rR I {i.e., MlBAhis) and 5 
units ol nIM were compared h> a commercially available ihermostable DNA polymerase 
(Sequithcrm: 5 units) using the buffer provided in the kit. (Sequitherm Cycle Sequencing 

5 kit. Epicenter Technologies. Madison. WI). The DN A-depcndent DNA polymerase 
activities of nRT and rRT were approximately equivalent. 

The DNA -dependent DNA polymerase activity was also determined at different 
temperatures, For these reactions, incorporated [a- 32 P]-dTTP served as a label and a non- 
radioacme primer was used. The reaction consisted of 200 ng of single-stranded 

10 M13m P 18 DNA. 1-5 pmoles of FSP, 50 mM Tris-HCl, pH 8.3, 40 mM KC1, 10 mM 
MgCF. 1 mM DTF. O.o /<Ci of [«- ,: P]-dTTP (3,000 Ci/mmol), and 20 each of dATP, 
dGTP. dCTP. and dTI T. in a total volume of 24 ^1. A conventional protocol was used 
for the reactions (Sambrook ct al.. (1989)) and the reactions were terminated by adding 

2 M l of 10 mM EDTA (0.8 mM final concentration). The incorporated la- 32 P]-dTTP was 
15 determined using DE52 membranes and scintillation counting, as described above. Results 

shown in Fig. 5 indicate that the optimum temperature for DN A-dependent DNA 
polymerase activity lor rRT was 45°C-50°C; for nRT. the temperature optimum was 
55° C. The DNA-dependent DNA polymerase activities of the RTs of the invention 
broadens the range of applications amenable to use of these polypeptides. In addition to 
20 copying DNA as well as RNA. the enzymes may be used in any of the above-mentioned 
variety of amplification technologies known in the art. In addition, the polypeptides of the 
invention may be used to sequence RNA or DNA targets using Sanger's enzymatic 
approach as originally disclosed or any one of the many variations of that technique that 
have been developed since that time. 

25 Example 10 

An rRT (i.e., MlBAhis) according to the invention (i.e., MlBAhis) was subjected 
to an RNasc H assay, using a protocol known in the art. Hillenbrand el al, Nucl. Acids 
Res. 10:833 (1982). Reactions (25 jil) contained 20 mM HEPES-KOH, pH 8.0 (23°C), 
10 mM MgCl : , 50 mM KC1. 1 mM DTT, 0.24 mM (a- ,: P| P oly( A)-poly(dT) (1:2; 15 
30 /iCi/ml), and 4 /ul of diluted enzyme purified from MlBAhis as described above. 

For control reactions, standard stocks of RNase II (Molecular Biology Resources, 
Inc. Milwaukee, Wl) with known activity were assayed in the range of 0.05 to 0.5 
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units/reaction (one unit of activity is defined as the amount of enzyme required to produce 
1 nmol of acid soluble ribonucleotide from [a- i: P] poly A-poly(dT) in 20 minutes at 
37 O. 'two reactions were run without enzyme to serve as negative controls. 

A reaction mixture, less enzyme, was prepared and the reaction started by the 
5 addition of enzyme. After 20 minutes of incubation at 37° C, the reaction was terminated 
by adding 25 M l of cold yeast tRNA as co-precipitant (10 mg/ml in 0. 1 M sodium acetate, 
pH 5.0) followed by 200 A d of 10% trichloroacetic acid. Samples were then placed on ice 
for at least 10 minutes. The mixtures were centrifuged for 7 minutes at 16.000 x g in an 
Eppendorf microcentrifuge (Brinkman Instruments, Westburg, NY), and 200 M l of the 
10 supernatant fluid was withdrawn and counted in 5 ml of scintillation fluid. 

The RNase H activity of the rRT at different temperatures was also tested using the 
reaction mixture described above. The results are presented as counts per minute of 
released radiolabeled ribonucleotide for each of two trials, as shown in Fig. 6A (black: 
rRT {i.e., MlBAhis); gray-hatched: nRT). The data show that rRT had RNase H activity 
15 comparable to that of native RT. In addition, rRT activity was assessed at a variety of 
temperatures and the results presented in Fig. 6B showed that rRT was active over a wide 
range of temperatures. The optimum RNase H activity for rRT was 50°C. In contrast, 
RNase H activity was relatively low at temperatures of 37°C, 60°C and 65 C C. Because 
of differences in the temperature optima for RT RNase II activity and the other RT 
20 activities, such as the RNA- and DNA-depcndent DNA polymerase activities, the various 
methods relying on RT activity may be optimized by adjusting the temperature to achieve 
the desired mix of activities. For example, methods involving use of an RNase II activity 
may be performed at temperatures relatively close to the 55 °C temperature optimum for 
the RNase H activity of rRT. Methods that benefit from decreased RNase II activity, such 
25 as RT-PCR and RAMP, may be performed at 60-65° C to maintain a low level of RNase 
II activity. 

Example 1 1 

A variety of polynucleotides were constructed that encoded modified RT fragments. 
These modified RTs include a and [i polypeptides that have been terminally modified by 
30 deletion of a naturally occurring terminal region of the peptide to produce a-Iike and P-like 
fragments retaining RNA-dependent DNA polymerase activity. Other modified RTs 
according to the invention involve an a-like or p-like fragment attached at either the N- 
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terminus. C- terminus, or both termini to one or more peptides (those peptides including 
simple homo-oligomeric peptides, preferably charged or bulky, and peptides containing 
useful functionalities such as D N A binding, metal binding, structure stabilizing and 
polymerizing \e.£.. zinc finger domains, leucine zipper motifs, an NS1 binding site, GPRP 
5 (single-letter amino acid identification) or its inverse PRPG, among others] capacities). 
Yet other modified RTs according to the invention include fragments that lack a sequence 
found internally in one of the native polypeptides, a or p. 

Techniques used to construct polynucleotides encoding these modified RTs are 
known in the art and described in Examples 1 and 3 above. Generally, the strategy was 

10 to use PGR to construct the desired polynucleotide, which was then cloned and expressed 
to produce the encoded modified RT. The expression studies were generally conducted as 
described in Example 4. 

Expression of eukaryotic genes in prokaryotes may result in production of 
misincorporated, truncated and/or insoluble proteins (misfolding) due to the presence of rare 

15 codons in those eukaryotic genes Translation of these rare codons is limited by the regulated 
expression of tRNAs corresponding to these rare codons Hence, expression of eukaryotic 
genes having abundant rare codons sometimes results in misincorporation, truncation and/or 
misfolding One approach to minimizing such problems is to clone the tRNA corresponding 
to these rare codons and express the clone in K.coli in order to facilitate the expression of 

20 eukaryotic genes. We have cloned and expressed the ArgU tRNA because the arginine 
codons f AGG, AGA CGA and CGG) present the largest number of rare codons in AMV-RT. 
Co-expression of AMV-RT and ArgU is expected to improve expression {i.e., activity levels) 
of AMV-RT Other rare codons such as leucine (CTA) and proline (CCC) will also be cloned 
and co-expressed 

25 Another approach to improved expression of the modified RTs of the invention in 

prokaryotes is to change the rare coaons in modified RT coding regions to frequent!) used 
codons Such changes can be readily effected by a variety of techniques known in the art, 
<.'.#., site-directed mutagenesis using synthetic oligonucleotides In an A. coh expression 
system, there would be 90 rare codons (38 arginine, 23 proline, 1 5 isoleucine, 10 leucine and 

30 4 serine codons) in the AMV-RT gene, all or some of which may be advantageously changed 
to frequent codons Changing all c )0 rare codons to the frequent codons found in abundantly 
expressed genes could imbalance host cell metabolism, however To accommodate 
deleterious effects on host cell metabolism arising from modified RT expression levels that are 
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loo high, a libiarv of clones may be constructed using, e.g., an M 13-based approach to site- 
directed mutagenesis involving oligonucleotide primer incorporation Specifically, pools of 
synthetic olmonucleotides. each oligonucleotide designed to convert one or a few rare codons 
to frequent codons. and a template comprising a modified RT coding region may be used to 
synthesize a collection of modified RTs having a range of 1-90 rare codon conversions 
Clones having RT activity may be isolated from this library by conventional screening 
techniques (e g . binding to radioactive substrate and activity assays, among others) 

To facilitate an understanding of the structures of the various polynucleotides and 
polypeptides disclosed in this Example, Table II below collects pertinent information. All 
constructions generated by PCR used a suitable, full-length coding region sequence as a 
template, such as the pol gene sequence found in Ml-5,6. 

Table II 



Clone Approx. 

Length 
of 

Coding 
Region 

M1BA (His6) 1754 



MlBA(HislO) 1766 



M1UA <HisI2) 1772 



MlBA(Leu) 1754 



Oligonucleotide 

additions/PCR 

Primers 



FM1BA Smal 
(SEQ ID 
NO:25); 
RMlBAhisXhoI 
extend (SEQ ID 
NO:45) 

FM1BA Smal 
(SEQ ID 
NO:25); 
RM1BA HislO 
(SEQ ID NO:46) 

FM1BA Smal 
(SEQ ID 
NO:25); 
RM1BA Hisl2 
(SEQ ID NO:47) 

EM1BA Smal 
(SEQ ID 
NO:25); 
RM 1 BA Leu 
(SEQ ID NO: 48) 



Added 

oligonucleotide 
locations 
(nucleotide 
numbering of 
SEQ ID NO:l) 

253-269 
1986-1967 



253-269 
1986-1967 



253-269 
1986-1967 



253-269 
1986-1968 



Added 

oligonucleotide 
characteristics 



6 His codons 



10 His codons 



12 His codons 



6 Leu codons 
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- 45 - 
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Appro: 
Length 



Coding 
Reeion 



Clone 



M 1 HA (Lys) 



Ml HA (Arg6) 



Ml HA (Arg3. 
X4) 



Ml HA (Asp6) 1754 



MlHA(Asp4) 1748 



MlHA(Asp5) 1751 



MlBA(AspS) 1760 



Oligonucleotide 

additions/PCR 

Primers 



EM1BA Smal 
(SEQ ID 
NO:25); 
RM1BA Lys 
(SEQ ID NO:49) 

FM1BA Smal 
(SEQ ID 
NO:25); 
RM1BA Arg6 
(SEQ ID NO:50) 

FM1BA Smal 
(SEQ ID 
NO:25); 

RM1BA Arg3X4 
(SEQ ID NO:51) 

FM1BA Smal 
(SEQ ID 
NO: 25); 
RM1BA Asp6 
(SEQ ID NO:52) 

EM1BA Smal 
(SEQ ID 
NO:25); 
RM1BA Asp4 
(SEQ ID NO:53) 

FM1BA Smal 
(SEQ ID 
NO:25); 
RM1BA Asp5 
(SEQ ID NO:54) 

EM1BA Smal 
(SEQ ID 
NO:25); 
RM1BA Asp8 
(SEQ ID NO:55) 



Added 

oligonucleotide 
locations 
(nucleotide 
numbering of 
SEQ ID NO:l) 

253-269 
1986-1968 



253-269 
1986-1967 



253-269 
1986-1967 



253-269 
1986-1968 



253-269 
1986-1968 



253-269 
1986-1968 



253-269 
1986-1968 



Added 

oligonucleotide 
characteristics 



7 Lys codons 



6 Arg codons 



3 Arg, 2 Asn, 1 
Gin, 1 Tyr 
codon 



6 Asp codons 



4 Asp codons 



5 Asp codons 



8 Asp codons 
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Appiox. 

Length 

of 

Coding 
Rcsiion 



Oligonucleotide 

additions/PCR 

Primers 



M1BA 
(Aspl2) 



M1BA (Glu6, 
Xhol) 



M1BA 
(Glul2) 



M1BK 620 
His 



M 1 BK 640 
Xhol 



M1BK 660 
Xhol 



1982 



FM1BA Smal 
(SEQ ID 
NO:25); 
RN41BA Asp 12 
(SEQ ID NO:56) 

FM1BA Smal 
(SEQ ID 
NO:25); 
RM1BA Glu6, 
Xhol (SEQ ID 
NO:57) 

FM1BA Smal 
(SEQ ID 
NO:25); 
RM1BA Glul2 
(SEQ ID NO:58) 

FM 1 BA Smal 
(SEQ ID 
NO:25); 
RM1BK 620 
(SEQ ID NO:74) 

FM1BA Smal 
(SEQ ID 
NO:25); 

RM1BK 620 His 
(SEQ ID NO:60) 

FM1BA Smal 
(SEQ ID 

NO. 25); RM1BK 
640 Xhol (SEQ 
ID NO:76) 

FM1BA Smal 
(SEQ ID 
NO:25); 
RM1BK 660 
Xhol (SEQ ID 
NO:77) 



Added 

oligonucleotide 
locations 
(nucleotide 
numbering of 
SEQ ID NO:l) 

253-269 
1986-1968 



253-269 
1986-1968 



253-269 
1986-1968 



253-269 
2112-2092 



253-269 
2112-2092 



253-269 
2149-2169 



253-269 
2210-2232 



Added 

oligonucleotide 
characteristics 



12 Asp codons 



6 Glu codons 



12 Glu codons 



WO 00/42199 
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Approx. 
Length 

of 

Coding 
Region 



M 1 BK680 
Xhol 



M1BK 760 
Xhol 



M 1 BK 800 
Xhol 



M 1 BK 640 
His Xhol 



M 1 BK 660 
His Xhol 



M 1 BK 680 
His Xhol 



Oligonucleotide 

adduions/PCR 

Primers 



I'M 1 BA Smal 
(SEQ ID 
NO:25); 
RM1BK 680 
Xhol (SEQ ID 
NO:78) 

FM1BA Smal 
(SEQ ID 
NO:25); 
RM1BK 760 
Xhol (SEQ ID 
NO:79) 

FM1BA Smal 
(SEQ ID 
NO:25); 
RM1BK 800 
Xhol (SEQ ID 
NO:80) 

FM1BA Smal 
(SEQ ID 
NO:25); 

RM1BK 640 His 
Xhol (SEQ ID 
NO:81) 

EM1BA Smal 
(SEQ ID 
NO:25); 

RM1BK 660 His 
Xhol (SEQ ID 
NO:82) 

FM1BA Smal 
(SEQ ID 
NO:25); 

RM1BK 680 His 
Xhol (SEQ ID 
NO:83) 



Added 

oligonucleotide 
characteristics 



Added 

oligonucleotide 
locations 
(nucleotide 
numbering of 
SEQ ID NO: 1 ) 

253-269 



253-269 
2512-2532 

253-269 
2628-2649 

253-269 

2149-2169 6 His codons 

253-269 

2210-2232 6 His codons 

253-269 

2273-2292 6 His codons 



BNSDOC1D: <WO 0042199A1 
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Approx. 
Leneth 



Coding 
Rcu ion 



M1BK 760 
IIi.s Xhol 



M1HK 800 
His Xhol 



M1BA(LZIP2 1757 
Xhol) 



M1BA(LZIP3 1778 
Xhol) 



MIBA (LZIP4 
Xhol) 



Ml HA (LZIP5 
Xhoh 



Oligonucleotide 

additions/PCR 

Primers 



FM1BA Smal 
(SEQ ID 
NO:25); 

RM1BK 760 His 
Xhol (SEQ ID 
NO: 100) 

FM1BA Smal 
(SEQ ID 
NO:25); 

RM1BK 800 His 
Xhol (SEQ ID 
NO:84) 

FM1BA Smal 
(SEQ ID 
NO:25); 
RM1BA LZIP2 
Xhol (SEQ ID 
NO:61) 

FM1BA Smal 
(SEQ ID 
NO:25); 
RM1BA LZIP3 
Xhol (SEQ ID 
NO:62) 

FM1BA Smal 
(SEQ ID 
NO:25); 
RM1BA LZIP4 
Xhol (SEQ ID 
NO:63) 

FM1BA Smal 
(SEQ ID 
NO: 25): 
RM1BA LZIP5 
Xhol (SEQ ID 
NO. 64) 



Added 

oligonucleotide 
locations 
(nucleotide 
numbering of 
SEQ ID NO:l) 

253-269 
2512-2532 



253-269 
2628-2649 



253-269 
1986-1968 



253-269 
1986-1968 



253-269 
1986-1968* 



253-269 
1986-1968' 



Added 

oligonucleotide 
characteristics 



Leucine zipper 
(2 copies) 



Leucine zipper 
(3 copies) 



Leucine zipper 
(4 copies) 



Leucine zipper 
(5 copies) 
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Approx 
Lenath 



Coding 
Region 



Oligonucleotide 

additions/PCR 

Primers 



MlBA(Cyst2) 1742 



MlBA(Cyst6) 1754 



M1BA 1748 
(GPRP) 



M1BA 1748 
(PRPG) 



M1BA(NS1 1796 
Xhol) 



M1BA (WH) 



M1BA (3 PPG 
Xhol) 



FM1BA Smal 
(SEQ ID 
NO:25); 
RM1BA Cyst2 
(SEQ ID NO:65) 

FM1BA Smal 
(SEQ ID 
NO:25); 
RM1BA Cyst6 
(SEQ ID NO:66) 

EM1BA Smal 
(SEQ ID 
NO:25); 
RM1BA GPRP 
(SEQ ID NO:67) 

FM1BA Smal 
(SEQ ID 
NO:25); 
RM1BA PRPG 
(SEQ ID NO:68) 

FM1BA Smal 
(SEQ ID 
NO:25); 
RM1BA NS1 
Xhol (SEQ ID 
NO: 98) 

FM1BA Smal 
(SEQ ID 
NO:25); 
RM1BA WH 
(SEQ ID NO:69) 

FM1BA Smal 
(SEQ ID 
NO:25); 
RM1BA 3PPG 
Xhol (SEQ ID 
NO:70) 



Added 

oligonucleotide 
locations 
(nucleotide 
numbering of 
SEQ ID NO:l) 

253-269 



1986-1968 



253-269 
1986-1968 



253-269 
1986-1968 



253-269 
1986-1968 



253-269 
1986-1966 



1986-1968 



Added 

oligonucleotide 
characteristics 



2 Cys codons 



6 Cys codons 



GPRP motif 



3 "PPG" motifs 
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Clone Approx. 

Lcngtli 
of 

Coding 
Region 

MlBA(Trp) 1754 



M1BA (Nhis 1754 
Smal) 



M1BA (NWtl 1769 
Smal) 



DNPCR1 1754 
(D450N) 



Oligonucleotide 

additions/PCR 

Primers 



DNPCR2 
(D505N) 



M1BA 
(E484Q) 



FM 1 BA Smal 
(SEQ ID 
NO:25); 
RM1BA TRP 
(SEQ ID NO:71) 

FM 1 BA Nhis 
Smal (SEQ ID 
NO:72); RM1BA 
Xhol (SEQ ID 
NO:59) 

FM1BA NWH 
Smal (SEQ ID 
NO:73); RM1BA 
Xhol (SEQ ID 
NO: 59) 

FDNPCR1 
(D450N) (SEQ 
ID NO:92>; 
RDNPCR1 
(D450N) (SEQ 
ID NO:93) 

FDNPCR2 
(D505N) (SEQ 
ID NO:94); 
RDNPCR2 
(D505N) (SEQ 
ID NO:95) 

FM 1 BA E484Q 
(SEQ ID 

NO:96); RM1BA 
E484Q (SEQ ID 
NO:97) 



Added 

oligonucleotide 
locations 
(nucleotide 
numbering of 
SEQ ID NO:l) 

253-269 
1986-1968 



253-269 
1986-1967 



253-270 
1986-1967 



Added 

oligonucleotide 
characteristics 



6 Trp codons 



6 His codons 



Mismatch at 
position 1600 
of SEQ ID 
NO:l 



mismatch at 
position 1765 
of SEQ ID 
NO:l 



mismatch at 
position 1702 
of SEQ ID 
NO:l 
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Core domain 
deletion- 
Fragment la 



Core domain 
deletion- 
Fragment lb 



C ore domain 
deletion- 3' 
fragment 2a 



Approx. 
Length 

of" 

Coding 
Region 



Oligonucleotide Added 



Core domain 2233 
deletion- 
Fragment lc 



additions/ PGR 
Primers 



FM1BA Smal 
(SEQ ID 
NO:25); and 
RM1BK 620 
Xhol (SEQ ID 
NO:74); and F 
Cint Xhol (SEQ 
ID NO:85); 
R Cint Sail (SEQ 
ID NO:86) 

FM1BA Smal 
(SEQ ID 
NO: 25); and 
RM1BK 640 
Xhol: and F Cint 
Xhol (SEQ ID 
NO:85); 

R Cint Sail (SEQ 
ID NO: 86) 

FM1BA Smal 
(SEQ ID 
NO: 25): and 
RM1BK 660 
Xhol; and F Cint 
Xhol (SEQ ID 
NO:85); 

R Cint Sail (SEQ 
ID NO:86) 

FM1BA Smal 
(SEQ ID 
NO:25); and 
RM1BK 620 
Xhol (SEQ ID 
NO:74); and F 
Cint Xhol (SEQ 
ID NO:85); 
R Cint His Sail 
(SEQ ID NO. 87) 



oligonucleotide 
locations 
( nucleotide 
numbering of 
SEQ ID NO:l) 

253-269 
2092-2112 

2560-2580 
2788-2811 



253-269 
2149-2169 
2560-2580 
2788-2811 

253-269 
2210-2232 
2560-2580 
2788-2811 

253-269 
2092-2112 
2560-2580 
2788-2811 



oligonucleotide 
characteristics 



6 His codons 
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Core domain 
deletion- 3' 
fragment 2b 



Core domain 
deletion- 3 1 
fragment 2c 



Core domain 
deletion- 3' 
fragment 3a 



Core domain 
deletion- 3' 
fragment 3b 



Appro x. 
Length 

of 

Coding 
Region 



2155 



Oligonucleotide 

additions/PCR 

Primers 



eotide 



FM1BA Smal 
(SEQ ID 
NO: 25); and 
RM1BK 640 
Xhol; and F Cini 
Xhol (SEQ ID 
NO: 85); 
R Cint His Sail 
(SEQ ID NO: 87) 

FM1BA Smal 
(SEQ ID 
NO: 25); and 
RM1BK 660 
Xhol; and F Cint 
Xhol (SEQ ID 
NO:85); 
R Cint His Sail 
(SEQ ID NO:87) 

FM 1 BA Smal 
(SEQ ID 
NO:25); and 
RM1BK 620 
Xhol (SEQ ID 
NO:74); and F 
Cint 731 Sail 
(SEQ ID 
NO:88); 
RCint 830 Xhol 
(SEQ ID NO:90) 

FM1BA Smal 
(SEQ ID 
NO:25); and 
RM1BK 640 
Xhol; and F Cint 
731 Sail (SEQ 
ID NO:88); 
RCint 830 Xhol 
(SFIQ ID NO:90) 



Added 
oligonuc 
locations 
(nucleotide 
numbering of 
SEQ ID NO:l) 

253-269 
2149-2169 
2560-2580 
2788-2811 

253-269 
2210-2232 
2560-2580 
2788-2811 

253-269 
2092-2112 

2443-2463 
2736-2716 



2149-2169 



Added 

oligonucleotide 
characteristics 



6 Flis codons 
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Core domain 
deletion- 3' 
fragment 3c 



Core domain 
deletion- 3' 
fragment 4a 



Approx. 
Length 



Coding 
Rem on 



2275 



Oligonucleotide Added 



additions/ PCR 
Primers 



Core domain 2158 
deletion- 3' 
fragment 4b 



Core domain 
deletion- 3' 
fragment 4c 



FM1BA Smal 
<SEQ ID 
NO:25); and 
RMIBK 660 
Xhol; and F Cint 
731 Sail (SEQ 
ID NO:88); 
RCint 830 Xhol 
(SEQ ID NO:90) 

FM1BA Smal 
(SEQ ID 
NO:25); and 
RMIBK 620 
Xhol (SEQ ID 
NO: 74); and F 
Cint 751 Sail 
(SEQ ID 
NO:89); 
RCint 830 Xhol 
(SEQ ID NO: 90) 

FM1BA Smal 
(SEQ ID 
NO:25); and 
RMIBK 640 
Xhol; and F Cint 
751 Sail (SEQ 
ID NO:89); 
RCint 830 Xhol 
(SEQ ID NO:90) 

FM1BA Smal 
(SEQ ID 
NO:25); and 
RMIBK 660 
Xhol; and F Cint 
751 Sail (SEQ 
ID NO:89); 
RCint 830 Xhol 
(SEQ ID NO:90) 



oligonucleotide 
locations 
(nucleotide 
numbering of 
SEQ ID NO:l) 

253-269 
2210-2232 
2443-2463 
2736-2716 

253-269 
2092-2112 
2497-2517 
2736-2716 



oligonucleotide 
characteristics 



253-269 
2149-2169 
2497-2517 
2736-2716 

253-269 
2210-2232 
2497-2517 
2736-2716 
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Approx. 
Length 



Coding 
Region 



Oligonucleotide 

addiiions/PCR 

Primers 



.■oiidc 



Added 
oligonuck 
locations 
(nucleotide 
numbering of 
SEQ ID NO: 1) 



Core domain 


2032 FMlBASmal 


253-269 


deletion- 3' 


(SEQ ID 




fragment 5a 


NO: 25); and 


2092-21 12 




RM1BK 620 






Xhol (SEQ ID 






NO: 74): and F 


2566-2586 




Cint 771 Sail 






(SEQ ID 


2736-2716 




NO'99) ' 






RCint 830 Xhol 






(SEQ ID NO:90) 




Core domain 


2089 FMlBASmal 


253-269 


deletion- 3' 


(SEQ ID 




fragment 5b 


NO: 25); and 


2149-2169 




RM1BK 640 






Xhol; and F Cint 


2566-2586 




771 Sail (SEQ 






ID NO:99); 


2736-2716 




RCint 830 Xhol 






(SEQ ID NO:90) 




Core domain 


2152 FM1BA Smal 


253-269 


deletion- 3' 


(SEQ ID 




fragment 5c 


NO:25); and 


2210-2232 




RM1BK 660 






Xhol: and E Cint 


2566-2586 




771 Sail (SEQ 






ID NO:99); 


2736-2716 




RCint 830 Xhol 






(SEQ ID NO:90) 




Oligonucleotides hybridize to an internal region of oligonucleot 



Added 

oligonucleotide 
characteristics 



which in turn recognizes the indicated region of SEQ ID NO:l. 
A. Terminally deleted RTs 

The full-length RT coding region was truncated by deletions using conventional 
methodologies described above (e.g.. Example 3). One set of deletion derivatives lacked 
the 3' end of the MAV-RT coding region to varying extents. Again, relative to the full- 
length gene (SEQ ID NOT), the 3' (C-terminal) deletion extending to the Kpn\ site 
(MIKA; see SEQ ID NO:8) increased the RT expression level, as evidenced by SDS- 
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PAGE Relative to the full-length gene (SEQ ID NO: 1). deletion of the region extending 
from ihe BglU site to the 3' terminus (MIBA; see SEQ ID NO:6) also increased RT 
expression and acti\H>. as evidenced by SDS-PAGE and activity assays isee below). The 
C-term.nally truncated RTs (M1KA and MIBA) have lengths that fall in between the 
lengths of the native a and [5 polypeptides. Relative to the alpha fragment of MAV-RT, 
the beta fragment lias an additional 254 amino acids at the C-terminus, which provides an 
integrase aciiviu This region of the polypeptide contributes to the insolubility of the 
polypeptide and reduces its recovery from cell extracts, as shown by the relative 
insolubility of a ^ ) mtegrase form of RT (e.g., the M1KA gene product, see below) 
compared to a (-» integrase form {e.g. , the MIBA gene product). Because the integrase 
domain is only needed for the retroviral life cycle and not for the RNA- or DN A-dependent 
DNA polymerase activities, this region was deleted in MIBA (a-like fragment). Note that 
the MIBA alike fragment (amino acids 1-578 of SEQ ID NO:2) is larger than the 
naturally occurring a fragment of MAV-RT (amino acids 1-573 of SEQ ID NO:2). 
Without wishing to be bound by theory, this deletion was expected to result in an increase 
in the solubility, and hence recovery, of the protein. 

A series of clones was constructed to express the MIBA and M1KA series of 
modified RTs. which have C-terminal deletions in order to increase the levels of expression 
and to stabilize the RT activity (RNA-dependent DNA polymerase activity). Convenient 
restriction sites in full-length clones such as PMBacRT and pHRT, e.g., Bgl II (spanning 
nucleotides 1,986-1,991 of SEQ ID NO:l) and Kpn\ (spanning nucleotides 2,745-2,750 
of SEQ ID NO:l), were used to eliminate the 3' end of the coding region of the RT gene 
(see. Table I). The 3' deletion derivatives, encoding RT polypeptide fragments having C- 
terminal deletions, were obtained by BglU-Psi\ or Kpnl-Pml restrictions of pMBacRT and 
pIIRT. respectively (BglU and Kpn\ sites in the MAV-RT coding region; Pstl site in the 
vector). Recombinant molecules containing the Bgl H-Pstl 3' terminal deletion were 
designated pBacMIBA and pIIBRT (pH33ABP6) and recombinant molecules containing 
the Kpnl-Pst\ deletion were designated pBacMIKA and pHKRT (pH33AKP5). The 
deletion derivatives pBacMIBA and pBacMIKA had approximately 1.17 and 0.4 kb 
deletions from the 3' end of the full-length gene (see, SEQ ID NO:l), respectively. The 
fragment bounded at its 3' end by the BglU site (SEQ ID NO:6) was used to express an 
alpha-like RT fragment (the a-like fragment, MIBA, contained amino acids 1-578 of SEQ 
ID NO:2; native MAV-RT a contains amino acids 1-572 of SEQ ID NO:2) and the 
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fragment bounded by the Kpn\ site (SEQ ID NO:S) was used to express a beta-like RT 
fragment (the 0-like fragment. M1KA, contained ammo acids 1-832 of SCQ ID NO:2: 
naiixc MAV-RT p eoniams ammo acids 1-858 of SF:.Q ID NO 2) 

Mmiprep and sequencing analyses were done to confirm the identities of the 
recombinant clones described above Recombinant viruses obtained from co-transfection with 
virus BacPakb and transfer vector pBacMIBA or pBacMlKA were called M1BA and Ml KA, 
respectively 

B. Alpha-like recombinants encoding non-native terminal peptides 

J_ Simple peptide taus 

One category of a fragment modifications was designed to mimic one or more 
properties of the mtegrase domain found in the fi fragment but missing from the a fragment 
of Type 111 RTs Partial mimicking of the integrasc domain, without the deleterious impact 
on solubility and host cell viability associated with the native integrase domain, was 
accomplished by adding polynucleotide sequences encoding His tags at the 3 ' ends of the 
modified RT coding regions 

A His-tag addition to the C-terminus of an RT polypeptide was achieved by 
recombinant expression of a polynucleotide containing an RT coding region fused in-frame 
to His codons. In particular, the fusions were constructed by adding oligonucleotides 
containing 6 histidine codons to the 3' end of the RT gene using ligase, as in the case of the 
construction of pBacMIKAhis, or by PCR amplification with oligonucleotides that specified 
6 histidine codons, as in the case of the construction of pBacMIBAhis 

The basic nature of the added His amino acids was expected to increase binding to the 
negatively charged nucleic acids, enhancing the stability of the polypeptides. The increased 
stability, in turn, was expected to result in increased activity of amino-acid-tagged RTs, 
i relative to their untagged counterparts In addition, the His tags were expected to chelate 
metal ions (e.g., Ni '"), thereby potentiating polymerization of the modified RT > A His-tagged 
RT (MlBAhis) was found in homo-polymeric form (molecular weight greater than 200 kDa), 
as determined using non-denaturing PAGE and molecular sieve chromatography with 
Superose 12HR 10/30 (separation range of 1-300 kDa, Pharmacia-Upjohn) 
) Expression levels of the RT fragments modified by amino acid tagging showed that 

the structurally unstable alpha fragment was stabilized by addition ofpept.de tags to the C- 
terminus of the AMV-RT alpha fragment 
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Other modified RTs bearing peptides at the C-termmus of the a-like fragment were 
generated by PCR, as described above The forward and reverse PCR primers had codons 
con-espondm" to the - and ('-termini of the AM Y-RT alpha fragment, along with codons 
corresponding to the peptide tags to be added A linearized template (pHSL-M 1 ) containing 
the full-length RT gene was used for the PCR amplifications Additional information 
concerning this class of modified RTs, as well as the polynucleotides encoding them, is found 
in Table U 

The PC'R product was restricted with a suitable restriction enzyme and ligated to 
pBacPakO that nas digested with a compatible enzyme. The selected recombinants were 
sequenced to confirm addition of the appropriate tags 
2_ ('-te rminal p e ptides exhibiting DNA binding pr operties 

DNA binding motifs of proteins, may have either general affinity (i.e., non-specific 
binding) or sequence-specific affinity for DNA Several nucleic acid binding domains have 
been identified and reported to play a role in important cellular functions such as viral 
packaging, transcriptional and translation^ regulation, transport between the nucleus and 
cytoplasm, spl.cng, and stability, among others Karaya et a! , J Biol Chem. 266: 1 162 1 - 
11627 (1991), Burd. et al.. Science 265:61 5-621 (1994), Weiss, et al., Biopolymers 48(2- 
3) 167-180 (1998), Nassal, M , J Virol. 66(7):4 107-16 (1992) Ritt, et al., Biochemistry 
37 2673-81 ( 1998) DNA binding domains with general affinity are preferable to target- 
specific binding domains because of the reduced substrate specificity of modified RTs having 
such general binding domains. 

Several basic amino acids are known to enhance the affinity of a protein for nucleic 
acid templates The positive charges of arginine, lysine, and histidinc increase the non-specific 
affinity of polypeptides containing such residues for nucleic acid, thereby facilitating the search 
for specific binding sites Several arg.n.ne-rich motifs and arg.nine-lysine-rich motifs have 
been identified in nucleic acid binding domains The argmine-'.ysme rich motif 
LLK1KRLRKKFAQKMLRKARRK is involved in RNA binding, which could enhance the 
activity of RT In addition, a lys.nc-rich protein is associated with DNA in the kinetoplast and 
plays a role in segregation of the kinetoplast DNA Hines, Mol. and Biochem Parasitol 
94 41-52 (1998) Similarly, acidic amino acid tags are reported to be involved in packaging 
of viral DNA The packaging may be mediated through metal ions that have affinity for DNA 
International Patent Publication No WO 98/07869 Additionally, charged amino acids are 
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present on the surface of structural proteins and may play a role in stabilizing secondary 
structures 

The addition of histidme, glutamic acid, and aspartic acid tags enhanced the activity 
of the alpha fragment 20- 1 00 fold A peptide tag consisting of six arginine residues improved 
the activity five-fold However, specific argininc-rich motifs such as RNRNRQY (Arg3X4. 
found at the C-terminus of the GP67 envelope glycoprotein proposed to be involved in 
bacuioviral DNA packaging) enhanced (i.e., increased or prolonged) activity by 20- to 40- 
fold Other RNA- and DNA-binding motifs such as RRRDRGRS are expected to yield similar 
results However, six continuous lysine residues did not increase the activity A higher 
number of lysine residues or correct spacing of the lysine residues may be required for 
enhancement of function 

The mechanism of enhancement of activity due to these tags could be due to the 
increased structural stability of the recombinant or stability resulting from direct or metal- 
mediated nucleic acid binding 

M1BA 2000-3000 Units/g of insect cells 

M 1 BA his 50000-200,000 U/g 

MlBAarg6 15,750 U/g 

MlBAlys6 2050 U/g 

M1BA Arg3X4 57,000 U/g 

MlBAglu6 170,000 U/g 

MlBAasp6 40.000 U/g 

MlBAleu6 2250-3900 U/g 

Nhis M 1BA asp4 95,000 U/g 

Nhis M 1BA asp5 11 5,250 U/g 

Nhis M1BA asp6 236,250 U/g 

Most of the sequence-specific DNA binding proteins have a general basic region and 
a sequence-specific region for binding to DNA. There are several sequence-specific DNA 
binding motifs such as zinc-finger domains TF111A CX2CX1 2HX3H) and the basic 

region of the bZIP family of proteins Similarly, there are arginine-rich domains such as 
TRQARRNRRRWRARQR and YGRKKRRQRRRP that recognize specific RNA sequences 
that are also expected to enhance the activity of RT The N-terminus of the RT intcgrase 
domain has a zinc-finger-like (Hx31 IX23CX2C) motif This N-terminus binds zinc and has 
been reported to both induce proper folding of the N-terminus, to be remarkably thermostable 



PCT/USOO/00896 

.VO 00/42199 - 59 - 

as well. Burke et al , .1 Biol. Chem. 267 9639-44 (1992) Because the full-length MAV-RT 
gene has a zinc-finger-like domain, the reverse primer used in some PCR amplifications 
included this region of the miegrase (see "fable II). 

A beta-like derivative (620 amino acids) containing the zinc-finger-like motif was 
more active than the non-tagged alpha fragment (578 amino adds) and expressed 30,000 units 
per gram of cell pellet 
M1BK620 3 1,950 U/g 

MlBK620 his 50,000-140,000 U/g 

The addition of the sequence-specific, zinc-finger-like motif produced a lower level of RT 
activity than the H.s-tagged fragment, however These results suggest that a general nucleic 
acid binding domain (His tag) may enhance RT activity to a greater extent than a sequence- 
specific domain (zinc-finger-like motif) and, therefore, could replace the sequence-specific 
zinc-fmger-like motif of RT, leading to an increase in activity. General nucleic acid binding 
domains enhance the stability of both the 578- and the 620-amino-acid-length fragments 
3_ C-terminal p e ptide tags having polymerization domains 

Disulfide bond-forming domains (i.e., cysteinc-rich regions) present in immunoglobulin 
genes are involved in disulfide bond formation between the light and heavy chains. Hence, 
addition of two cysteine residues at the C-termmus was anticipated to promote dimer 
formation through disulfide bonding. 

Addition of two cysteine residues enhanced the activity of the alpha-like fragment, 
however, 6 contiguous cysteine residues reduced the activity of the modified RT 

MIBA 2000-3000 U/g 

Ml BA cyst2 190,000 U/g 

M 1 BA cyst6 720 U/g 

The GPRP (fibrin clotting) tetrapeptidc is the primary polymerization pocket of the 
blood clotting protein fibrin This domain is exposed at the ammo terminus of fibrin 
monomers by proteolytic cleavage of the precursor protein The domain then polymerizes by 
binding to complementary binding sites on other fibrinogen molecules to form clots Because 
peptides were being added to the C-tcrminus of a-like constructs, the reverse-sequence 
tetrapeptide. PRPG, was also examined 

Addition of GPRP enhanced the RT activity approximately 50-fold, while addition of 
PRPG enhanced the activity of RT by approximately 1 00-fold In other embodiments, the D- 
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isomers of ammo acids arc used in pept.de tags For example. D-isomers are used in 
generating PRPG peptides for use in preparing modified RTs of the invention 
MIBA 2000-3000 U/g 

M1BAGPRP 107.500 U/g 

5 MIBA PRPG 243,500 U/g 

Histidine residues can also promote d.mer formation mediated by metal ions. The 
addition of 6 His residues to the C-terminus of the a-like RT resulted in a 20- to 40-fold 
increase in activity Additions of different length histidine tags are contemplated. 
MiBA 2000-3000 U/g 

10 MIBA his 50000-200000 U/g 

NSl is a DNA-bmding protein produced by the minute virus ofm.ee. The protein has 
relational and transcriptional functions Homo-ol.gomerizat.on of NSl is required for its 
function and a small region, N-VETTVTTAQETKRGR1QTK-C, of NSl has been identified 
as the domain mvolvcd in ol.gomerizat.on Pujol et al , J V.rol 71:7393-7403 (1997) 
15 Addiiion of this peptide tag to the C-terminus of AMV-RT fragments enhanced RT activity 
MIBA 2000-3000 U/g 

MIBA NSl 380,000 U/g 

4_ r-K-rminal peptide tnos having meta l binding domains 

Histidine tags can be used as metal binding domains, as explained above. In addition, 
20 modified RTs having C-terminal His tags were constructed and subjected to expression 
analyses The results, presented above, indicate that peptide tags, having metal binding 
capacity, enhance RT expression 

Zinc fingers also exhibit metal binding capacity and are also involved in DNA binding. 
As described above, the N-term,nus of the integrase domain of MAV-RT has a zinc-finger-like 
25 (H\31I\23CX2C) motif This N-terminus binds zinc and has been reported to induce proper 
folding of the N-terminus It is expected that peptide tags containing one or more zinc-f.nger- 
l.kc domains will enhance the activity of modified RTs in which they are found 
^ C-terminal p e ptide tatis havint? structure-stabilizinu domains 

Other embodiments of the invention involve the addition of domains designed to 
30 structurally stabilize the alpha-like fragment so that it no longer requires a second fragment 
for structural stability There are several motifs that have been identified and shown to form 
specific structures, such as alpha helices, beta sheets, and coils, among others, all of which are 
known in the art Formation of defined structures facilitates the formation of active domains 
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and promotes interactions with other such domains Beta strands and beta sheets frequently 
promote aggregation in, and precipitation from, solution Desjarlais et al. Curr. Op.n in 
Bioieciinol o 4oO-4oo ( 1 ) 1 loncc. most of the C-termmal tag additions were capable of 
forming helices or coils. These secondary structure predictions are based on the well-known 
Chou and Fassman algorithms. 

The WEAAH (WH) motif, comprising histidinc and tryptophan, promotes formation 
of alpha helices, or defined structures, thereby giving structural stability to the protein. 
M1BA 2000-3000 U/g 

MIBAWH 104,720 U/g 

Addition of the WH domain may extend the helix at the C-tenninus and thereby enhancing the 
stability of the alpha fragment Regardless of the reason, however, modified RTs containing 
a WH motif exhibit enhanced RT activity 

The "PPG" triple-helical domain is responsible for binding interactions in the structural 
protein collagen This motif is responsible for the structural stability and proper assembly of 
collagen Addition of peptides containing this motif in generating modified RTs according to 
the invention is expected to enhance the activity of such RTs relative to corresponding RTs 
lacking such peptides. 

Addition of tryptophan residues is predicted to extend the a-hclix at the C-terminus 
and to enhance the stability of the alpha-like fragment. Tryptophan is a bulky amino acid and 
could substitute for histidine tags in providing structural stability Comparative assays showed 
that a domain comprising Trp residues enhanced RT activ.ty approximately 50-fold. 
M1BA 2000-3000 U/g 

MlBATrp 96,500 U/g 

The GPRP and PRPG motifs identified in fibrin as the domains involved in interaction 
with other clotting proteins enhance the activity of the AMV-RT alpha-like fragment This 
motif is predicted to form coil-turn-coil structures 
M1BA 2000-3000 U/g 

M 1 BA GPRP 107,500 U/g 

MIBAPRPG 243.500 U/g 

The NS1 domain primarily forms beta sheets and coils. The presence of hydrophobic 
residues alone is not very desirable because they form beta sheets and are typically buried in 
the secondary structure of the protein This may affect the natural folding of domains Hence, 
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a motif that had a mixture of coils and beta sheets was chosen for analysis. Addition of this 
domain produced an active a-like fragment that appeared to be stable 
M 1 BA 2000-3000 l.'/g 

MIBANS1 380,000 U/g 

The leucine zipper motif is a helix-turn-helix motif which has been reported to 
dimerize by a coiled-coil interaction This defined structure of the leucine zipper is expected 
to enhance the stability of the alpha-like fragment in addition to providing dimcrization 
abilities 

M 1 BA 2000-3000 U/g 

MIBALzip23 7170 U/g 

MIBALzip3 1620 U/g 

Addition of a single heptad repeat enhanced the activity by 2-3 fold Addition of two 
heptad repeats did not improve the activity However, additions of 4-5 heptad repeats 
produced RTs that had reduced activity levels. 
6_ N-terminal peptide tags 

Consistent with the description in Examples 3 and 4 of N-terminal peptide tags being 
added to modified RTs that exhibited enhanced expression, several constructs were generated 
and characterized One modified RT, NhisMlBA, contained a His tag attached to the N- 
terminus of an a-like fragment. Other RTs were modified to contain peptide tags at both 
termini (Nhis M1BA asp 4, Nhis Ml BA asp 5, Nhis M ! B A asp 6, and Nhis M1BA WH). 
Expression studies conducted as described in Example 4 led to the results shown below 
Nhis M 1 BA 1 0,000-4 1 ,700 U/g 

MIBAChis 50.000-200,000 U/g 

Nhis Ml BA asp 4 95,000 U/g 
NhisMlBA asp 5 1 15.250 U/g 
NhisMlBA asp 6 236,250 U/g 
NhisMlBAWH 86,000 U/g 

Expression of M IBAChis was measured to provide a relative control for the measurement of 
Nhis M1BA expression The results show that activity of RTs modified by a His tag present 
at cither the N-terminus or the C-terminus is increased relative to untagged RTs Other 
variations, such as the addition of peptide tags to both termini of an RT (e.g., an N-terminal 
His tag coupled to a C-terminal Asp-, Glu-. or Trp-His- (i.e., WH) tag), are also contemplated 
by the invention Large-scale expression studies have shown that similar activity levels of 
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approximately 100,000 units/" insect cells are achieved with M 1 B A asp (N-termmally 
modified RT) and Nhis M1BA asp (RT having 6 His residues at the N-terminus and 4-6 Asp 
residues at the C '-terminus) 
1_ Peptide taamnu of othe r Type III RTs 

The strategies described above were also used to modify RTs from other avian 
sources, such as Rous Sarcoma Virus and Avian Tumor Virus The C-terminal addition of 
a six-histidine peptide tag to an alpha fragment of each of these avian RTs substantially 
increased the RT activity, relative to the non-tagged AMV-RT a-like fragment. 
M1BA 2000-3000 U/g 

RSV-RT 43,350 U/g 

ATV-RT 71,900 U/g 

Therefore, the modification strategies applied to AMV-RT polynucleotides and polypeptides 
are applicable more generally to d.menc (i.e.. Type II and Type III) reverse transcriptase 
coding regions and polypeptides, and all of these modified RTs fall within the scope of the 
present invention. 
C. Beta-like recombinants 
Modifications of [i RT 

Polynucleotides encoding a variety of beta-like modified RTs were constructed using 
the techniques described in Example 3 and expressed using the techniques described in 
Example 4, along with Ml-5,6 encoding the full-length AMV-RT Expression of the full- 
length beta fragment resulted in low levels of highly insoluble, full-length protein, in both a 
eukaryotic (insect cell) and a prokaryotic (K coil) host. Because expression of the full-length 
beta fragment resulted in mostly insoluble protein, the native beta polypeptide was modified 
in an effort to increase its solubility and, hence, activity One strategy for modifying the p 
fragment involved deletions of parts of the native p RT The native beta coding region 
specifies 858 amino acids and the full-length p-hke fragment disclosed herein consists of 832 
amino acids Thus, the p-likc fragment lacks the 26 C-terminal amino acids of full-length 
native p Expression of the full-length p-Iike polypeptide, relative to the full-length native p. 
showed an increase of one-hundred-fold in expression, as evidenced by SDS-PAGE analysis, 
however, the p-hke polypeptide was still highly insoluble (approximately 90% insoluble), 
resulting in a five-fold increase in activity 
MIKA 1000 U/Liter of cells 

MIKAhis 2,000 U/L 
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Ml -5 200 U/I. 

Modified RTs having ( -termini between 580 and 832 ammo acids (see SEQ ID NO. 2) 
are also contemplated by the invention Because both the 580- and the 620-amino-acid 
recombinants are soluble, and the 832- and 858-amino-acid recombinants are relatively 
insoluble, deletions that truncate the C-terminus to a position between 580-832 amino acids 
are expected to result in modified P-like polypeptides that are soluble In particular 
embodiments, the p-like polypeptide has a C-terminus at any one of positions 580-832, such 
as positions 620, 640, 660, 740, 780, or 800 (SEQ ID NO:2), resulting from deletions that 
eliminate 237, 217, 197, 117, 77 and 57 amino acids, respectively, relative to the full-length 
(3 FIT Construction and expression of a deletion derivative specifying a modified p-like RT 
of 620 amino acids was accomplished as generally described in Examples 3 and 4, with the 
expression results presented below. 
M1BA 2000-3000 U/g 

M1KA 1000 U/L 

MIBK 620 3 1,950 U/g 

Thus, a truncated P-like RT shows considerable activity, consistent with an increase in 
solubility relative to the full-length native P RT. 

Analogous modifications to the corresponding P polypeptides of other avian RTs 
result in similarly increased RT activity 
RSV-RT 620 his 33,000 U/g 

In addition to 3' deletions resulting in polynucleotides encoding P-like polypeptides 
having C-termini in the range of positions 580-832, and preferably in the range of 620-800 
(SEQ IT) NO. 2), the invention contemplates polynucleotides having internal deletions relative 
to the native P gene, as well as the polypeptides encoded by polynucleotides having such 
internal deletions The central core region of the integrase domain is associated with the DNA 
cutting and joining properties of the native AMV-RT 

The core region of the integrase domain was deleted to varying extents (the region 
between amino acids 620-770. 640-770 or 660-770 of SEQ ID NO 2), e.g., M1BK Cint lacks 
amino acids 620-770 of SEQ ID NO 2, using conventional techniques The approach 
involved the initial construction of first polynucleotide fragments encoding C-terminally 
truncated p-like fragments using PCR with the full-length AMV-RT pol gene as a template 
(sec Table II) Second fragments containing various lengths the 3' of the end of the pol gene 
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(i.e.. r fragment) were also constructed using PGR. These 3" fragments encoded the C- 
termmal region of the integrase domain, some 3' fragments also contained part, but not all. 
of the core region of the integrase domain Those of skill in the art will recognize that the first 
polynucleotide fragments, or 5 1 fragments, may encode peptide tags at their 5' ends, the 3 
fragments may also encode peptide tags (see 3' Fragment 2a in Table 11), with or without 
tags encoded by the 5" fragment, and these tag-encoding fragments are readily synthesized 
using the 1>CR primers d.sclosed herein (e.jf., F Cint Xhol (SEQ ID NO:85) and R Cm 830 
His Xhol (SEQ ID NO:91)) The final step in generating constructs having internal deletions 
was to ligate taincated p-like cod.ng regions to 3' fragments in proper order and orientation, 
as determined by the conventional screening of ligation products In one embodiment, amino 
acids 620-770 were deleted, thereby removing the core region of the integrase domain. The 
C-terminal region of the integrase domain was then placed adjacent to the N-terminal region 
of that domain 

Expression of such constructs in insect cells revealed an increase in solubility (10- 
20%) and activity relative to the full-length, intact p RT, as shown below Other deletions 
effectively removing part or all of the central region of the integrase domain, such as the 
deletion of amino acids 620-731, 640-771, 640-731, 660-771, 660-731, 680-77 1 , 680-73 1 , 
and 740-771 (SEQ ID NO 2) are contemplated by the invention. 
M1KA 1000 U/L 

Ml -5 200 U/L 

Some modified beta fragments have terminal peptide tags Thus, the invention 
contemplates modified RTs having internal deletions and, optionally, peptide tags at an N- 
terminus, a C-tcrminus or both termini In addition, as for cc-like modified RTs, the p-like 
modified RTs may be derived from any Type II or Type III RT, along with polynucleotides 
encoding them 

Any of the modified RTs of the invention may be produced by any process disclosed 
herein or known in the art, such as in vivo synthesis, /// vitro synthesis or chemical synthesis 
Further, any of these processes may be used to produce active polypeptides in a variety of 
forms, including monomers, homo-dimcrs or homo-multimers, hetero-dimcrs, and hetero- 
multimcrs, all of which arc comprehended by the invention In particular, expression of the 
modified beta-like fragment M 1 BK620 Cint resulted in expression of a heterodimeric form 
of RT, suggesting that the beta-like fragment was processed as expected, to yield an a 
polypeptide in association with a modified P-hkc polypeptide. Expression of other modified 
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RTs of the invention, such as other core domain deletions (t'.^p-likc fragments lacking amino 
acids 620-731, 640-771, 640-731, 660-771. 660-731, 680-771, 6S0-731, or 740-771 of SEQ 
ID NO 2) are expected to show activity in other than monomelic form. c.<:., in heterodimenc 
form. In addition, hetcrodimers or other non-monomeric forms may arise from the interaction 
of a modified a-like polypeptide and a native P polypeptide, or from a modified a-like 
polypeptide and a modified p-Iike polypeptide, regardless of whether the polypeptides were 
produced by in vivo or in vitro expression, or by chemical synthesis 

While the present invention has been described in terms of specific embodiments, 
it is understood that variations and modifications will occur to those skilled in the art. 
Accordingly, only those limitations appearing in the appended claims should be placed 
upon the invention. 
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An isolated polynucleotide encoding a polypeptide having RN A-dependent 

DNA polymerase activity, the polypeptide consisting of 

(a) an amino acid sequence beginning at amino acid 1 and terminating 

at any one of amino acids 428 to 857 of SEQ ID NO:2; 
(h) an amino acid sequence beginning at amino acid 1 and terminating 

at any one of amino acids 428 to 1054 of SEQ ID NO: 39; 
(c> an ammo acid sequence beginning at amino acid 1 and terminating 

at any one of amino acids 548 to 1 198 of SEQ ID NO:41; 

(d) an amino acid sequence beginning at amino acid 1 and terminating 
at any one of amino acids 428 to 901 of SEQ ID NO:43; and 

(e) variants, analogs and fragments of any of subparts (a) to (e) having 
RN A -dependent DNA polymerase activity, 

said polypeptide, variants, analogs, and fragments optionally having an N- 
terminal methionine. 

The polynucleotide according to claim 1 step (a) wherein said polypeptide 
consists of a sequence that begins at about amino acid 1 and ends at about 
amino acid 578 of SEQ ID NO:2. 

The polynucleotide according to claim 1 step (a) wherein said polypeptide 
consists of the sequence set forth as SEQ ID NO:4. 

The polynucleotide according to claim 1 having a sequence selected from 
the group consisting of a sequence set forth in any one of SEQ ID NOs 1. 
6-10. 38, 40. and 42. 

The polynucleotide according to claim 1 wherein said polynucleotide is 
DNA. 

The polynucleotide according to claim 1 wherein said polynucleotide 
encodes a polypeptide that lacks an effective integrase activity. 
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7. The polynucleotide according to claim 6 wherein said polynucleotide lacks 
at least part of an integrase coding region. 

8. The polynucleotide according lo claim 1 further comprising an adjacent 
polynucleotide encoding at least one terminal modification of said 
polypeptide selected from the group consisting of an N-terminal 
modification and a C-terminal modification. 

9. The polynucleotide according to claim 8 wherein said modification is a 
cysteine residue adjacent the C-tcrminus of said polypeptide. 

10. The polynucleotide according to claim 8 wherein said adjacent 
polynucleotide encodes a polypeptide consisting of a C-terminal 
modification. 

11. The polynucleotide according to claim 10 wherein said C-terminal 
polypeptide comprises between four and fifty amino acids and wherein said 
polypeptide comprises a domain selected from the group consisting of a 
DNA binding domain, an RNA binding domain, a metal binding domain, 
a structure stabilizing domain, and a polymerizing domain. 

12. The polynucleotide according to claim 11 wherein said polypeptide 
comprises an acidic amino acid domain, a basic amino acid domain, a W 
domain, a WH domain, a zinc-finger-like domain, a leucine zipper domain, 
a PPG domain, an NS1 domain, a GPRP domain, and a PRPG domain 

13. The polynucleotide according to claim 1 1 wherein said C-terminal peptide 
comprises six amino acids. 

14. The polynucleotide according lo claim 1 1 wherein said C-terminal peptide 
comprises amino acids that are the same. 
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15. The polynucleotide according to claim 1 1 wherein said C-terminal peptide 
comprises amino acids that are basic. 

16. The polynucleotide according to claim 15 wherein said basic amino acids 
are hislidine. 

17. The polynucleotide according to claim 8 having a sequence selected from 
the group consisting of a sequence set forth in any one of SEQ ID NOs 1 1 - 
19. 

18. A vector comprising the polynucleotide according to claim 3 . 

19. The vector according to claim 18 wherein said polynucleotide is operably 
linked to a promoter. 

20. A host cell transformed with a vector according to claim 18. 

21. The host cell according to claim 20 wherein said host cell is a eukaryotic 
cell. 

22. The host cell according to claim 20 wherein said host cell is selected from 
the group consisting of Escherichia coli and an insect cell. 

23. An isolated polypeptide encoded by the polynucleotide according to any one 
of claims 1 to 5. 

24. An isolated polypeptide encoded by the polynucleotide according to any 
one of claims 6 to 17. 

25. A method of transforming host cells comprising the following steps: 

(a) introducing a vector according to claim 18 into host cells; 

(b) incubating said host cells; and 
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(c) identifying host cells containing said vector, thereby identifying a 
transformed host cell. 

26. A method of producing an isolated Reverse Transcriptase polypeptide 
comprising the following steps: 

(a) transforming a host cell with a vector according to claim 18; 
(h) incubating said host cell under conditions suitable for expression of 
a polypeptide; and 

(c) recovering said polypeptide, thereby producing an isolated Reverse 
Transcriptase. 

27. In a method for copying a target nucleic acid by extending a target nucleic 
acid bound primer in the presence of a polymerase, the improvement 
comprising: 

(a) contacting said target nucleic acid and primer with the polypeptide 
according to any one of claims 23 and 24. 

28. The method according to claim 27 wherein said copying produces a 
plurality of copies of said target nucleic acid. 

29. The method according to claim 27 wherein said polypeptide is in a form 
selected from the group consisting of a monomer and a polymer. 

30. The method according to claim 27 wherein said method is selected from the 
group consisting of cDNA synthesis. Polymerase Chain Reaction, 
Polymerase Chain Reaction-Reverse Transcription, Inverse Polymerase 
Chain Reaction, Multiplex Polymerase Chain Reaction, Strand 
Displacement Amplification. Multiplex Strand Displacement Amplification, 
Nucleic Acid Sequence-Based Amplification, Sequence-Specific Strand 
Replication and Rapid Amplification. 

31. In a method for sequencing a target nucleic acid by extending a target 
nucleic acid-bound primer, the improvement comprising: 
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(a» contacting said target nucleic acid and primer with the polypeptide 
according to any one of claims 23 and 24. 

32. The method according to claim 31 wherein said polypeptide is in a form 
selected from the group consisting of a monomer and a polymer. 

33. A kit for copying a target nucleic acid comprising: 

(a) one or more nucleotides, and 

(b) a polypeptide encoded by a polynucleotide having a sequence selected 
from the group consisting of SEQ ID NO:6, SEQ ID NO: 7, SEQ ID 
NO 8, SEQ ID NO 9, SEQ ID NO: I 1, SEQ ID NO: 12, SEQ ID 
NO 13, SEQ ID NO 38, SEQ ID NO 40, SEQ ID NO:42 and 
polynucleotide derivatives thereof encoding C-terminal modifications 
at their 3 ' ends 
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SEQUENCE LISTING 



Swam ma than, Noo I n (inventor) 
MOLECULAR BIOLOGY RESOURCES, IMC. 



BIOLOGICALLY ACTIVE, REVERSE TRANSCRI I'TAS i: 



<:130> 28003/34938 



<:14 0> 
<14 l:> 



<1S0> 60/116,099 
<lbl> 1999-01- lb 



<170> Patentln Ver . 2.0 

<210> 1 

<211> 3152 

<2I2> DNA 

<2I3> myelobJ<istos.is-associated virus 

<22 0> 
<22 1> CDS 

<222> (253) . . !2937 ) 

<223> full-length MAV-RT translated polynucleotide 



agqattgg cccaccgatt gg.cagtgat ggaggccgcq aacccgcaga tccatgggat faO 



aggaggggga 



atgc gaaaatctcg tgacatgata gaqttggggg tt.attaaccg 120 



agaegggtc. ttggagcgac ccctgctcct cttccccgca qtagctatgg ttagagggag 180 
25 tatcctagga agagattgtc tgeagggect agggctccgc ttgacaaatt tatagggagg 240 



gccactgttc tc act gtt gcg eta cat ctg get. att ccg etc aa 



tgq aag 291 



Thr Va 1 Ala Le 



is Leu Ala He ITo Leu Lys Trp Lyr, 
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cag tgq 
C.ln Trp 



gqt 
Gly 



tta gtg gaa aaa gaa tta cag tta gga 
Leu Val Glu Lys Glu Leu Gin Leu Gly 



agt tqc tqg aac aca cct gtc tr.t gtg ate 
Ser Cys Trp Asn Thr Pro Val Phe Val He 



egg aag get t c t t tat cgc tta ttg cat gac ttg cgc get gtt 483 

Arg Lys Ala S,t Cly S-.-r Tyr Arg Leu Leu His Asp Leu Arg Ala Val 



aac get aag ett gtt c,t ttt ggg gee gtc caa cag ggg gcg ccg gtt 
Asn Ala Lys Leu Val Iro Phe Gly Ala Val Gin Gin Gly Ala Fro Val 



etc tec gcg etc erg cgt ggt teg ece ctq atg gte eta gae ete aag 5/. 
Leu Ser Ala Leu Pro Arg Gly Trp Pro Leu Met Val Leu Asp Leu Lys 
95 100 105 

gat tqc tte ttt tet att cct ett geg gaa caa gat cgc gaa cgt ttt 627 

20 Asp Cys Phe Phe Ser lie Pro Leu Ala Glu Gin Asp Arg Glu Arg Phe 
110 11!' i<tU 

gea ttt acq etc ece tee gtg aat aae cag gee ccc get cga agg tte 67b 

Ala Phe Thr Leu Pro Ser Val Asn Asn Gin Ala Pro Ala Arg Arg Phe 

1 30 135 I" 

25 eaa tga aag gtc ttg ece caa g.,q atg ace tgt tet cee act ate tgt 723 
Gin Trp Lys Val Leu Pro Gin Gly Met Thr Cys Ser Pro Thr lie Cys 
145 150 

cag ttg ata gtg gqt caa ata ett. gag ccc ttg cga etc aag eac cea 771 
Gin Leu lie Val Gly Gin lie Leu Glu Pro Leu Arg Leu Lys His Pro 
30 160 1" 170 

tet etg cge atq ttg eat tat atg gat gat ett ttg eta gee gee tea 819 
Ser Leu Arg Met Leu His Tyr Met Asp Asp Leu Leu Leu Ala Ala Ser 
1 7 5 18 0 18 5 
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;.eu Giu Ala Ala Gly 01 u 



teg cct qat aaq q i. c cag agg gag 



Asp Lys Va! Gin Arg Glu Pro 



r.lu Arc, Ala Gly Phe Thr He Ser Pro Asp 



>a caa tat ctt ggg tac aag tta ggt agt acq tat gta gca 
a l Gin Tyr Leu Gly Tyr Lys Leu Gly Ser Thr Tyr Val Ala Pro 
2 3 0 



235 



10 .,«.. g,c ctg eta gca gaa ccc agg ata gec acc ttg tgg gat qtt cag 1011 

V . riy i,. u VU Ala Glu Pro Arg He Ala Thr Leu Trp Asp Val Gin 
240 

aag ctg gtg ggg tea ct.t cag tgg ctt cgc c,a gcg tta gga ate ccg 1059 

Ly:. Leu Val Gly Ser Leu Gin Trp Leu Arg Pro Ala Leu Gly He Pro 

i<; ".c.. 260 265 



c.i cqa ctq atg ggc 



ccc ttt tat qag cag tta eg. ggg tea gat cct 1107 
Glu Gin Leu Arg Gly Ser Asp Pro 



Pro Atg Leu Met Gly Pro Phe Tyr Gl 

nor, 2 8 5 

270 275 28u 

aac gag gcg agg gaa tgg aat eta gae atg aaa atg gec tgg aga gag 1155 
-»<> A,n Glu Ala Arg Glu Trp Asn Leu Asp Met Lys Met Ala Trp Arg Glu 

oqs 300 

290 

ate qta cag etc age ace act get gee ttg gag ega tgg gae cct gec 1203 



1 It- Val Gin Leu Se 

305 



Thr Thr Ala Ala Leu Giu Arg Trp Asp Pro Ala 



310 315 



• ., ctq naa ouh qcq qtc get aga tgt gaa cag ggg gca ata ggg 12bl 

Pro Leu Glu Gly Ala Val Ala Arg Cys Glu Gin Gly Ala lie Giy 
320 325 330 



qtc ctq qqa cag gga ctq tec aca 



cac cca agg cca tgt ttg tgg 



Leu Gly Gin Gly Leu Ser Thr His Pro Arg Pro 
TV, 3.10 34 5 



Cys Leu Trp Leu 



gcg ttt act get tgg tta gaa gtg etc 1347 



Phe Ser Thr Gin Pro Thr Lys Ala Phe Thr Ala Trp Leu Glu Val Leu 

, ffl 3 65 

3S0 355 
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d-JC cr.t ttg ,-it.L act aag era cgt g::t teg gca gtg cga acc ttt gq: 13 9 5 
Thi Leu U-u lie Thr Lys Leu Arg Ala Ser Ala Val Arg Thr Phe Gly 

375 380 



ag gag gtt gat ate etc ctg ttg cct gca tgc ttt egg gag qac ctt 1443 
ys GJu Val Asp He Leu Leu Leu Pre Ala Cys Phe Arg Glu Asp Leu 



ccg gag ggg ate ctg tta gec ctt aag ggg ttt gca gga aaa 
Pro Glu Gly He Leu Leu Ala Leu Lys Gly Phe A] a Gly Lys 
400 405 410 



ate agg agt agt gac acg 



cca tct att ttt gac att gcg cgt cca ctg 



He Arg Ser Ser Asp Thr Pro Ser He Phe Asp He Ala Arg Pro Leu 
4 15 42 0 4 25 



gtt tct ctg aaa gtg agg gtt acc qac cac cct gtg ccg gga 



ccc 1587 



His Val Ser Leu Lys Val Arg Val Thr Asp His Pro Val Pro Gly Pro 
430 435 440 445 

act gtc ttt act gac gec tee tea age acc cat aag ggg gtg gta gtc 1635 
Thr Val Phe Thr Asp Ala Ser Ser Ser Thr His Lys Gly Val Val Val 
450 455 460 

tgg agg gag ggc cca agg tgg gag ata aaa gaa ata get gat ttg ggg 1683 
Trp Arg Glu Gly Pro Arg Trp Glu He Lys Glu He Ala Asp Leu Gly 
465 470 475 

gca agt gta caa caa ctg gaa gca cgc get gtg gec atg gca ctt ctg 1731 
Ala Ser Val Gin Gin Leu Glu Ala Arg Ala Val Ala Met Ala Leu Leu 
480 485 490 

ctg tgg ccg aca acg ccc act aat gta gtg act gac tec gcg ttt gtt 1779 
Leu Trp Pro Thr Thr Pro Thr Asn Val Val Thr Asp Ser Ala Phe Val 
495 500 505 

gcg aaa atg tta etc aag atg gga eag gag gga gtc ccg tct aca gcg 1827 
Ala Lys Met Leu Leu Lys Met Gly Gin Glu Gly Val Pro Ser Thr Ala 
510 515 520 525 

gcg get ttt att tta gag gat gcg tta age caa agg tea gec atg gec 1875 
Ala Ala Phe He Leu Glu Asp Ala Leu Ser Gin Arg Ser Ala Met Ala 

530 535 540 
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gcc gtt etc cac gtg egg agt cat tct qaa gty cca ggg ttt _tc aca i _ ^ 
Ala Val Leu H:r, Val Ara Ser His Ser GLu Val Pro Gly Phe Phe Thr 
M 5 550 555 

gaa qga aat gac gtg gca gat age caa gee ace ttt eaa geg tat cec 1971 
Glu Gly Asn Asp Val Ala Asp Ser Gin Ala Thr Phe Gin Ala Tyr Pro 
560 565 SV0 

ttg aga gag get aaa gat etc cat acc get etc cat att gga ccc cgc 2019 
Leu Arg Glu Ala Lys Asp Leu His Thr Ala Leu His He Gly Pro Arg 
575 ^80 585 

geg eta tee aaa geg tgt aat ata tct atg cag caq get agg gag gtt 2067 
Ala Leu Ser Lys Ala Cys Asn He Ser 
590 595 



Met Gin Gin Ala Arg Glu Val 
600 605 



gtt cag acc tgc ccg cat tgt aat tea gcc cct geg ttg gag gcc ggg 2115 
Val Gin Thr Cys Pro His Cys Asn Ser Ala Pro Ala Leu Glu Ala Gly 
610 615 620 



gta aac cct agg ggt ttg gga ccc eta caq ata tgg cag aca gac ttt 
Val Asn Pro Arg Gly Leu Gly Pro Leu Gin He Trp Gin Thr Asp Phe 
625 630 635 

aca ctt gag cct aga atg gcc ccc cgt tee tgg etc get gtt act gtg 
Thr Leu Glu Pro Arg Met Ala Pro Arg Ser Trp Leu Ala Val Thr Val 
640 645 650 

gat acc gcc tea teg geg ata gtc gta act cag cat ggc cgt gtc aca 
Asp Thr Ala Ser Ser Ala He Val Val Thr Gin His Gly Arg Val Thr 
655 660 665 

teg gtt get gca caa cat cat tgg gcc acg get ate gcc gtt ttg gga 
Ser Val Ala Ala Gin His His Trp Ala Thr Ala lie Axa v'ai Leu G.y 
670 675 680 



685 



700 



2163 



aga cca aag gcc ata aaa aca gat aat ggg tec tgc ttc acg tct aac 
Ary Pro Lys Ala He J.ys Thr Asp Asn Gly Ser Cys Phe Thr Ser Lys 
690 695 



tec acg cga gag tgg etc geg aga tgg ggg ata gca cac acc acc ggg 2403 
Ser Thr Arg Glu Trp Leu Ala Arg Trp Gly He Ala His Thr Thr Gly 
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atg qta 
Met Val 



aac egg 
Asn Arg 



ctr -g aaa gat aag ate cgt gtg ctt gcg gag ggg gat gge ttt atg ^ 
1,-u i..,u Lys Asp Lys lie Arg Val Leu Ala Glu Gly Asp Gly Phe Met 
7 1 1, 7 4 0 7 4 3 

aaa aga ate ccc aec age aaa eag ggg gaa eta tta gec aag gca atg 2547 
I.y:. Arg I 1 <■ Pro Thr Ser Lys Gin Gly Glu Leu Leu Ala Lys Ala Met 
•;-„■ 75 5 7 60 76 ^ 

tat gec etc aat cac ttt gag cgt ggt gaa aac aca aaa aca ccg ata 2595 
Tyr AJa Leu As n His Phe Glu Arg Gly Glu Asn Thr Lys Thr Pro He 
770 775 780 

cau aaa cae tgg aga cct ace gtt ctt aca gaa gga ccc ccg gtt aaa 2643 
Gin Lys His Trp Arg Pro Thr Val Leu Thr Glu Gly Pro Pro Val Lys 
765 790 795 

ata cga ata gag aca ggg gag tgg gaa aaa gga tgg aac gtg ctg gtc 2691 
IK- Aig lie Glu Thr Gly Glu Trp Glu Lys Gly Trp Asn Val Leu Val 
800 805 810 

tgg ciqa cga ggt tat gec get gtg aaa aac agg gac act gat aag gtt 2739 
Trp Gly Arg Gly Tyr Ala Ala Val Lys Asn Arg Asp Thr Asp Lys Val 
P15 82 0 825 

att tgg qta ccc tct cga aaa gtt aaa ccg gac ate acc caa aag gat 2787 
lie Trp Val Pro Ser Arg Lys Val Lys Pro Asp He Thr Gin Lys Asp 
bjn 835 840 845 

vi..g gtg act jjg aaa gat gag gcg age cct ctt ttt gca ggc att tct 28 35 
Glu Val Thr Lys Lys Asp Glu Ala Ser Pro Leu Phe Ala Gly He Ser 
850 855 860 

qa: Tgg gcg ccc tgg gaa ggc gag caa gaa gga etc caa gaa gaa acc 2883 
A::p Trp Ala Pro Trp Glu Gly Glu Gin Glu Gly Leu Gin Glu Glu Thr 
865 870 875 

gec age aac aag caa gaa aga ccc gga gaa gac acc cct get gec aac 2931 
Ala Ser Asn Lys Gin Glu Arg Pro Gly Glu Asp Thr Pro Ala Ala Asn 

880 885 890 
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gag agt taattaLatt ctcattattq gtqtcctgqt cttgtgtgag gttacggggg 2067 



taagagcctg aaLqttca 



ctcgaqca gccagggaac ctttggatta catgggccaa 3047 



ccgtaeaggc caaacggatt tctgcctctc tacacagtca qccacctccc cttttcaaac 3107 . 
atgtttgata ggtatcccgt ctcctatttc cgaaggtgat tttaa 315 - 



<;no> 2 

v211> 8 95 
<:212> PRT 

<213> rayeloblastosis-associated virus 
<400> 2 

Thr Val Ala Leu His Leu Ala lie Pro Leu Lys Trp Lys Pro Asn His 



Thr Pro Val Trp lie Asp Gin Trp Pro Leu Pro Glu Gly Lys Leu Val 



Ala Leu Thr Gin Leu Val Glu Lys Gl 



u Leu Gin Leu Gly His lie Glu 



Pro Ser Leu Ser Cys Trp Asn 



Thr Pro Val Phe Val He Arg Lys Ala 



Ser Gly Ser Tvr Arg Leu Leu His Asp Leu Arg Ala Val Asn Ala Lys 
65 70 75 80 

Leu Val Pro Phe Gly Ala Val Gin Gin Gly Ala Pro Val Leu Ser Ala 



Leu Pro Arg Gly Trp Pro Leu Met Val Leu Asp Leu Lys Asp Cys Phe 
100 105 I 10 

Phe Ser He Pro Leu Ala Glu Gin Asp Arg Glu Arg Phe Ala Phe Thr 
115 120 125 

Leu Pro Ser Val Asn Asn Gin Ala Pro Ala Arg Arg Phe Gin Trp Lys 
130 135 140 



PCT/US00/00896 

WO 00/42199 o _ 



Val Lou Pro Gin GJ y Met Thr Cys 
1 A 5 15 0 



Thr lie Cys Gin Leu II 



Gin lie Le 



1 65 



170 



5 Met Leu His Tyr Met Asp Asp Leu Leu Leu Ala Ala Ser Ser His Asp 
180 185 150 

Gly Leu Glu Ala Ala Gly CI u Gl.u Val lie Ser Thr Leu Glu Arg Ala 

195 200 205 

Gly Phe Thr lie Ser Pro Asp Lys Val Gin Arg Glu Pro Gly Val Gin 
10 210 215 220 

Tyr Leu Gly Tyr Lys Leu Gly Ser Thr Tyr Val Ala Pro Val Gly Leu 
225 230 235 240 

Val Ala Glu Pro Arg He Ala Thr Leu Trp Asp Val Gin Lys Leu Val 
245 250 255 

15 Gly Ser Leu Gin Trp Leu Arg Pro Ala Leu Gly lie Pro Pro Arg Leu 



260 



265 270 



Met Gly Pro Phe Tyr Glu Gin Leu Arg Gly Ser Asp Pro Asn Glu Ala 
275 280 285 

Arg Glu Trp Asn Leu Asp Met Lys Met Ala Trp Arg Glu He Val Gin 
20 290 295 300 

Leu Ser Thr Thr Ala Ala Leu Glu Arg Trp Asp Pro Ala Leu Pro Leu 
305 310 315 320 

Glu Gly Ala Val Ala Arg Cys Glu Gin Gly Ala lie Gly Val Leu Gly 
325 330 335 

25 Gin Gly Leu Ser Thr His Pro Arg Pro Cys Leu Trp Leu Phe Ser Thr 
340 345 350 

Gin Pro Thr Lys Ala Phe Thr Ala Trp Leu Glu Val Leu Thr Leu Leu 

355 360 365 

He Thr Lys Leu Arg Ala Ser Ala Va] Arg Thr Phe Gly Lys Glu Val 
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Asp lie Leu l,eu Leu Pro Ala Cys Phe Arg GJ u Asp Leu Pro Leu Pro 



Glu Gly -Jo Leu Leu Ala Leu Lys Gly i he Ala Gly Lyr» He Arg Ser 
4 0b 410 415 

Ser Asp Thr Pro Ser He Phe Asp He Ala Arq Pro Leu H:s Val Ser 
420 425 43C 

Leu Lys Val Arg Val Thr Asp His Pro Val Pro Gly Pro Thr Val Phe 
435 440 445 

Thr Asp Ala Ser Ser Ser Thr His Lys Gly Val Val Val Trp Arg Glu 
4 50 4 55 4 60 

Gly Pro Arg Trp Glu He Lys Glu He Ala Asp Leu Gly Ala Ser Val 
465 470 475 480 

Gin Gin Leu Glu Ala Arg Ala Val Ala Met Ala Leu Leu Leu Trp Pro 
485 490 495 

Thr Thr Pro Thr Asn Val Val Thr Asp Ser Ala Phe Val Ala Lys Met 

500 505 510 

Leu Leu Lys Met Gly Gin Glu Gly Val Pro Ser Thr Ala Ala Ala Phe 
515 520 525 

He Leu Glu Asp Ala Leu Ser Gin Arg Ser Ala Met Ala Ala Val Leu 
530 535 540 

His Val Arg Ser His Ser Glu Val Pro Gly Phe Phe Thr Glu Gly Asn 
545 550 555 560 

Asp Val Ala Asp iier Gin Ala Tnr Pne uln Aia lyz Pro Arg S-u 

565 570 575 

Ala Lys Asp Leu His Thr Ala Leu His He Gly Pro Arg Ala Leu Ser 
580 585 590 

Lys Ala Cys Asn lie Ser Met Gin Gin Ala Arg Glu Val Val Gin Thr 
595 600 605 

Cys Pro His Cys Asn Ser Ala Pro Ala Leu Glu Ala Gly Val Asn Pro 
610 615 620 
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Arg Gly Leu Gly Pro Leu Gin lie Trp Gin Thr Asp Phe Thr Leu Glu 
62 5 630 63 5 64 0 

Pro Arg Met Ala Pic- Arg Sex Trp Leu Ala Vai Thr Veil Asp: Thr Ala 
645 650 655 

Ser Ser Ala lie Val Val Thr Gin His Gly Arg Val Thr Ser Val Ala 
660 665 670 

Ala Gin His His Trp Ala Thr Ala lie Ala Val Leu GJ y Arg Pro Lys 
675 680 685 

Ala lie Lys Thr Asp Asn Gly Ser Cys Phe Thr Ser Lys Ser Thr Arg 
690 695 700 

Glu Trp Leu Ala Arg Trp Gly He Ala His Thr Thr Gly He Pro Gly 
705 710 715 720 

Asn Ser Gin Gly Gin Ala Met Val Glu Arg Ala Asn Arg Leu Leu Lys 
725 730 735 

Asp Lys He Arg Val Leu Ala Giu Gly Asp Gly Phe Met Lys Arg He 
740 745 750 

Pro Thr Ser Lys Gin Gly Glu Leu Leu Ala Lys Ala Met Tyr Ala Leu 
755 760 765 

Asn His Phe Glu Arg Gly Glu Asn Thr Lys Thr Pro He Gin Lys His 
770 775 780 

Trp Arg Pro Thr Val Leu Thr Glu Gly Pro Pro Val Lys He Arg lie 
78 5 7 90 79 5 8 00 

Glu Thr Gly Giu Trp Giu Lys Giy Trp Asn Vai Leu Vax Trp Giy Arg 
805 810 815 

Gly Tyr Ala Ala Val Lys Asn Arg Asp Ttir Asp Lys Val He Trp Val 
820 825 830 

Pro Ser Arg Lys Val Lys Pro Asp He Thr Gin Lys Asp Glu Val Thr 
835 840 845 



Lys Lys Asp Glu Ala Ser Pro Leu Phe Ala Gly He Scr Asp Trp Ala 
850 855 860 



WO 00/42199 



PCT/USOO/00896 



Glu Thr Aid Ser Asn 

880 



5 <;jio> 3 

<211> 8Sm 

<::i2> i'RT 

<2'1 3 •• myf'lo!.I.i:::,.-i: • associated virus 
<220 > 

10 <223> full Jt.-ni'l. ♦ M.-L- amino acid 
<400> 3 

Met Thr Val M.i Leu His Leu Ala lie Pro Leu Lys Trp Lys Pro Asn 



His Thr Pro Val Trp lie Asp Gin Trp Pro Leu Pro Glu Gly Lys Leu 
15 20 25 30 

Val Ala Leu Thr Gin Lou Val Glu Lys Glu Leu Gin Leu Gly His lie 



Glu Pro Ser Leu Ser Cys Trp Asn Thr Pro Val Phe Val lie Arg Lys 



20 Ala Ser Gly Ser Tyr Arg Leu Leu His Asp Leu Arg Ala Val Asn Ala 



Lys Leu Val Pro Phe Gly Ala Val Gin Gin Gly Ala Pro Val Leu Ser 



Ala Leu Pro Arg Gly Trp Pro Leu Met Val Leu Asp Leu Lys Asp Cys 

25 100 105 110 

Phe Phe Ser He Pro Leu Ala Glu Gin Asp Arg Glu Arg Phe Ala Phe 

115 120 125 

Thr Leu Pro Ser Val Asn Asn Gin Ala Pro Ala Arg Arg Phe Gin Trp 

130 135 140 

30 Lys Val Leu Pro Gin Gly Met Thr Cys Ser Pro Thr He Cys Gin Leu 
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T if , -a J Gly Gin He Leu Glu Pro Leu Arg Leu Lys His Pro Ser Le 



P~g Met Leu His Tvr Met Asp Asp Leu Leu Leu Ala Ala Ser Ser His 
180 185 190 

Asp Gly Leu Glu Ala Ala Gly Glu Glu Val 1 .1 c Ser Thr Leu Glu Arg 
195 200 205 

Ala Gly Phe Thr He Ser Pro Asp Lys Val Gin Arg Glu Pro Gly Val 



Gin Tyr Leu Gly Tyr Lys Leu Gly Ser Thr Tyr Val Ala Pro Val Gly 
225 230 235 240 

Leu Val Ala Glu Pro Arg He Ala Thr Leu T rp Asp Val Gin Lys Leu 
245 250 255 

Val Gly Ser Leu Gin Trp Leu Arg Pro Ala Leu Gly He Pro Pro Arg 



265 



270 



Leu Met Gly Pro Phe Tyr Glu Gin Leu Arg Gly Ser Asp Pro Asn Glu 
275 280 205 

Ala Arg Glu Trp Asn Leu Asp Met Lys Met Ala Trp Arq Glu lie Val 
2 90 295 300 

Gin Leu Ser Thr Thr Ala Ala Leu Glu Arg Trp Asp Pro Ala Leu Pro 
305 310 315 320 

Leu Glu Gly Ala Val Ala Arg Cys Glu Gin Gly Ala He Gly Val Leu 

330 335 



Gly Gin Gly Leu Scr Thr His Pro Arg Pro Cys Leu Trp Leu Phe Ser 
340 345 350 

Thi Gin Pro Thr Lys Ala Phe Thr Ala Trp Leu Glu Val Leu Thr Leu 



Leu lie Thr Lys Leu Arg Ala Ser Ala Val Arg Thr Phe Gly Lys Glu 
370 375 380 
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Val Asp He Leu Leu Leu Pro Ala Cys Phe Arg Glu Asp Lou Pro Leu 

385 

Pro CJu Gly He Leu Leu /via Leu Lys Giy Iho Ala Gly Lys xx* ^rg 
405 410 415 



He Ala Arg Pro Leu His Val 

420 4 25 



5 Ser Ser Asp Thr Pro Ser He Phe Asp 

4->s 430 



Ser Leu 



Lys Val Arg Val Thr Asp His Pro Val Pro Gly Pro Thr Val 



445 



4 3 5 4 4 0 

Phe Thr Asp Ala Ser Ser Ser Thr His Lys Gly Val Val Val Trp Arg 



455 



460 



Glu Gly Pro Arq Trp Glu He Lys Glu lie Ala Asp Leu Gly Ala Ser 

..,„ 475 480 
4 65 4;0 4/o 

Val Gin Gin Leu Glu Ala Arg Ala Val Ala Met Ala Leu Leu Leu Trp 



490 



495 



15 Pro Thr Thr Pro Thr Asn Val Val Thr Asp Ser Ala Phe Val Ala Lys 



505 



51C 



Met Leu Leu Lys Met Gly Gin Glu Gly Val Pro Ser Thr Ala Ala Ala 
515 520 525 



Phe He Leu Glu Asp Ala Leu Ser Gin Arg Ser 

530 535 540 



Ala Met Ala Ala Val 



Leu His Val Arg Ser His Ser Glu Val Pro Gly Phe Phe Thr Glu Gly 

tC n 555 560 

545 550 5oj 

Asn Asp Val Ala Asp Ser Gin Ala Thr Phe Gin Ala Tyr Pro Leu Arg 

565 570 575 

25 Glu Ala Lys Asp Leu His Thr Ala Leu His lie Gly Pro Arq Ala Leu 



585 



590 



Ser Lys Ala Cys Asn He Ser Met Gin Gin Ala Arg Glu Val Val Gin 



595 6°° 



605 



Thr Cys Pro'hxs Cys Asn Ser Ala Pro Ala Leu Glu Ala Gly Val Asn 
30 610 615 620 
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Leu Gly Pro Leu Gin He Trp Gin Thr Asp phe Thr Leu 



AJ a Ser Scr Ala lie Val Val Thr Gin His Gly Arg Val Thr Ser Va 1 
660 665 670 

Ala Ala Gin Hi. Hi.-. Trp Ala Thr Ala He Ala Val Leu Gly Arg Pro 
67b 680 685 

Lys Ala lie Ly:. Thi Ar.p Asn Gly Ser Cys Phe Thr Ser Lys Ser Thr 
690 695 700 

Arg Glu Trp Leu Ah, An Trp Cly He Ala His Thr Thr Gly He Pro 
705 710 715 720 

Gly Asn Ser Gin Gly Gin Ala Met Val Glu Arg Ala Asn Arg Leu Leu 
7 2b 730 735 

Lys Asp Lys He Arg Val Leu Ala Glu Gly Asp Gly Phe Met Lys Arg 
740 745 750 

He Pro Thr Ser Lys Gin Gly Glu Leu Leu Ala Lys Ala Met Tyr Ala 
755 760 765 

Leu Asn His Phe Glu Arg Gly Glu Asn Thr Lys Thr Pro He Gin Lys 
770 775 780 

His Trp Arg Pro Thr Val Leu Thr Glu Gly Pro Pro Val Lys He Arg 

785 790 795 800 

He Glu Thr Gly Glu Trp Glu Lys Gly Trp Asn Val Leu Val Trp Gly 
805 810 815 

Arg Gly Tyr Ala Ala Val Lys Asn Arg Asp Thr Asp Lys Val He Trp 

820 825 830 

Val Pro Ser Arg Lys Val Lys Pro Asp He Thr Gin Lys Asp Glu Val 
835 &/[0 84 5 

Thr Lys Lys Asp Glu Ala Ser Pro Leu Phe Ala Gly He Ser Asp Trp 
85Q 855 860 



WO 00/42199 



- 15 - 



PCT/US00/00896 



r P Glu Gly Glu Gin Glu Gly Leu Gin Glu Glu Thr Ala Ser 



PRT 

myel ob Las Los i s -associated vir 



•:;'2;- ulphu (no met, no tag, no stop) 
<.1<ji),- A 

Thr Val Ala Leu His Leu Ala lie Pro Leu Lys Trp Lys Pro Asn His 
1 5 1° 15 

Thr no V.» 1 Trp lie Asp Gin Trp Pro Leu Pro Glu Gly Lys Leu Val 
20 

Ai., Leu Thr Gin Leu Val Glu Lys Glu Leu Gin Leu Gly His lie Glu 

3 r, 4 0 4 5 

Pro f.er Leu Ser Cys Trp Asn Thr Pro Val Phe Val He Arg Lys Ala 
:'.-r Gly Ser Tyi Arg Leu Leu His Asp Leu Arg 



Ala Val Asn Ala Lys 



Leu Val Pro Phe Gly Ala Val Gin Gin Gly Ala Pro Val Leu Ser Ala 



> Leu Pio Arg Gly Trp Pro Leu Met Val Leu Asp Leu Lys Asp Cys Phe 



Phe .Ser lie Pro Leu Ala Glu Gin Asp Arg Glu Arg Phe Ala Phe Thr 
IIS 120 125 



Leu Pro Ser Val Asn A: 



;n Gin Ala Pro Ala Arg Arg Phe Gin Trp Lys 
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Val Leu Pro Gin Gly 



Cys Gin Leu I le 



Val Gly Gin lie Leu Glu Pro Leu Arg Leu Lys His Pro Ser Leu Arg 



165 



170 175 



Met Leu His Tyr Met Asp Asp Leu Leu Leu Ala Ala Ser Ser His Asp 



185 



190 



Gly Leu Glu Ala Ala Gly Glu Glu Val He Ser Thr Leu Glu Arg Ala 
195 200 205 

10 Gly Phe Thr He Ser Pro Asp Lys Val Gin Arg Glu Pro Gly Val Gin 
210 215 220 

Tyr Leu Gly Tyr Lys Leu Gly Ser Thr Tyr Val Ala Pro Val Gly Leu 
22S 230 235 240 

Val Ala Glu Pro Arg lie Ala Thr Leu Trp Asp Val Gin Lys Leu Val 
15 245 250 255 

Gly Ser Leu Gin Trp Leu Arg Pro Ala Leu Gly He Pro Pro Arg Leu 
260 265 270 

Met Gly Pro Phe Tyr Glu Gin Leu Arg Gly Ser Asp Pro Asn Glu Ala 



20 Arg Glu Trp Asn Leu Asp Met Lys Met Ala Trp Arg Glu He Val Gin 

290 295 300 

Leu Ser Thr Thr Ala Ala Leu Glu Arg Trp Asp Pro Ala Leu Pro Leu 

^ 310 315 320 



Glu Gly Ala Val Ala Arg Cys Glu Gin Gly Ala He Gly Val Leu Gly 
25 325 



330 335 



Pro Cys Leu Trp Leu Phe Ser Thr 

340 



Gin Gly Leu Scr Thr His Pro Arg P. 

345 350 



Gin Pro Thr Lys Ala Phe Thr Ala Trp Leu Glu Val Leu Thr Leu Leu 

355 360 365 
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Lys Leu Arg Ala Ser Ai a Val Arg Thr Phe Gly I.ys Glu Val 

^■ r . } 380 



Asp Ue Leu Le 



390 



Giu Asp 
395 



Glu Gly lie Leu Leu Ala Leu Lys Gly Phe Ala Gly Lys He Arg Se 

410 415 



Ser Asp Thr Pro Ser He Phe Asp He Ala Arg Pro Leu His Val Ser 
420 425 

Leu Lys Val Arg Val Thr Asp His Pro Val Pro Gly Pro Thr Val Phe 
435 440 445 

Thr Asp Ala Ser Ser Ser Thr His Lys Gly Val Val Val Trp Arg Glu 
450 455 460 



lu lie Ala Asp Leu Gly Ala Ser Val 

470 



Gly Pro Arg Trp Glu He Lys Glu 

475 480 



Gin Gin Leu Glu Ala Arg Ale 



Val Ala Met Ala Leu Leu Leu Trp Pro 



Thr Thr Pro Thr Asn Val Val Thr Asp Ser Ala Phe Val Ala Lys Met 
500 505 510 

Leu Leu Lys Met Gly Gin Glu Gly Val Pro Ser Thr Ala Ala Ala Phe 



lie Leu Glu Asp Ala Leu Ser Gin Arg Ser Ala Met Ala Ala Val Leu 
530 535 540 



er Glu Val Pro Gly Phe Phe Thr Glu Gly Asn 
550 



His Val Arg Ser His Ser 

555 560 



Asp Val Ala Asp Ser Gin ALa Thr Phe Gin Ala Tyr Pro Leu Arg Glu 
565 570 575 



Ala Lys 
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d s t o r, i s - a n sociated 



Thr Val Ala Leu His Leu Ala lie Pro Leu Lys Trp Lys Pro Asn His 
,5 1° 15 

Thr Pro Val Trp lie Asp Gin Trp Pro Leu Pro Glu Gly Lys Leu Val 
H) 20 25 30 

Ala Leu Thr Gin Leu Val Glu Lys Glu Leu Gin Leu Gly His He Glu 
3 5 40 4 b 

Pro Ser Leu Ser Cys Trp Asn Thr Pro Val Phe Val lie Arg Lys Ala 

b 0 55 60 

IS u,. r Gly Sor Tyr Aig Leu Leu His Asp Leu Arg Ala Val Asn Ala Lys 
Cb 70 75 80 

Leu Val Pro Phe Gly Ala Val Gin Gin Gly Ala Pro Val Leu Ser Ala 



L«„ Pro Arg Gly Trp Pro Leu Met Val Leu Asp Leu Lys Asp Cys Phe 

20 100 105 no 

Ph.- :;er lie Pro Leu Ala Glu Gin Asp Arg Glu Arg Phe Ala Phe Thr 

llb 120 125 

l,.-u Pre fer Val Asn Asn Gin Ala Pro Ala Arg Arg Phe Gin Trp Lys 

130 135 140 

25 Val Leu Pro Gin Gly Met Thr Cys Ser Pro Thr lie Cys Gin Leu He 



Val Gly Gin lie Leu Glu Pro Leu Arg Leu Lys His Pro Ser Leu Arg 
105 170 l7- r - 



Met Leu His Tyr Met Asp Asp Leu 
30 1 B 0 



Leu Leu Ala Ala Ser Ser His Asp 
18 5 l^ f J 
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Leu Glu Ala Aid 



Glu Glu Val lie Ser Thr Leu Glu Arg Ala 



Gly I'he Tr.r He Scr I':o Asp Lys Val 
2.10 215 



5 Tyr Leu Gly Tyr Lys Leu Gly Ser Thr Tyr Val Ala Pro Val Gly Leu 



235 



240 



Val Ala Glu Pro Arg He Ala Thr Leu Trp Asp Val Gin Lys Leu Val 



Gly Ser Leu Gin Tr P Leu Arg Pro Ala Leu Gly lie Pro Pro Arg Leu 



10 260 



265 270 



Met Gly Pro Phe Tyr Glu Gin Leu Arg Gly Ser Asp Pro Asn Glu Ala 



Arg Glu Trp Asn Leu Asp Met Lys Met Ala Trp Arg Glu He Val Gin 
290 295 300 

15 Leu Ser Thr Thr Ala Ala Leu Glu Arg Trp Asp Fro Ala Leu Pro Leu 



310 315 



320 



Glu Gly Ala Val Ala Arg Cys Glu Gin Gly Ala He Gly Val Leu Gly 
325 330 335 

Gin Gly Leu Ser Thr Wis Pro Arg Pro Cys Leu Trp Leu Phe Ser Thr 
20 34 0 34 5 350 

Gin Pro Thr Lys Ala Phe Thr Ala Trp Leu Glu Val Leu Thr Leu Leu 
355 360 365 

He Thr Lys Leu Arg Ala Ser Ala Val Arg Thr Phe Gly Lys Glu Val 
370 375 380 

25 Asp lie Leu Leu Leu Pro Ala Cys Phe Arg Glu Asp Leu Pro Leu Ho 
385 390 395 400 

Glu Gly He Leu Leu Ala Leu Lys Gly Phe Ala Gly Lys He Arg Scr 
4 05 4 10 4 15 

Ser As P Thr Pro Ser He Phe Asp He Ala Arg Pro Leu His Val Scr 
30 420 425 430 



BNSDOCID: <WO . 0042199A1 . 



PCT/US00/00896 

WO 00/42199 . 20 - 

435 440 445 

, el 3el oe r Thi His Lys Gly Val VuJ Val lrp Arg Glu 



455 



460 



5 Gly Pro Arg Trp Glu He I.ys Glu He Ma As P Leu Gly Ma Ser Val 



4 80 

470 



475 



Gin Gin Leu Glu Ma Arg Ma Val Ala Met Ma Leu Leu Leu Trp Pro 



485 



490 495 



i v-,1 Thr As d Ser Ala Phe Val Ala Lys Met 
Thr Thr Pro Thr Asn Val Val Thr Asp 

sns b 1 0 

JO 500 b U5 



515 

530 

15 Mrs Val Arg Ser Hi. Ser Glu Val Pro Cly Phe Phe Thr Glu Gly Asn 
545 "0 "5 

Asp Val Ala Asp Ser Gin Ma Thr Phe Gin Ala Tyr Pro Leu Arg Glu 
565 570 575 

Ala Lys Asp Leu Has Thr Ala Leu Hrs He Gly Pro Arg Ma Leu Ser 
^RS 590 
20 580 b9b 

595 600 605 

25 Arg Gly Leu Gly Pro Leu Gin He Trp Gin Thr Asp Phe Thr Leu Glu 
625 "0 635 

Pro Arg Met Ala Pro Arg Ser Trp Leu Ala Val Thr Val As P Thr Ala 
645 650 "5 

Ser Ser Ma He Vai Val Thr Gin H.i s Gly Arg Val Thr Ser Val Ala 
30 660 b6j 
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Ala Cln ids His Trp Ala Thr Ala lie Ala Val Leu Gly Arg Pro Lys 



Thr Asp Auii dy 



5 Glu Trp Leu Ala Arg Trp Gly lie Ala His Thr Thr Gly He Pro Gly 

71b 720 



Asn Ser Gin Gly Gin Ala Met Val Glu Arg Ala Asn Arg l.eu Leu Lys 
725 730 735 

Asp Lys He Arg Val Leu Ala Glu Gly Asp Gly Phe Met Lys Arg lie 
740 7 45 750 

Pro Thr Ser Lys Gin Gly Glu Leu Leu Ala Lys Ala Met Tyr Ala Leu 
755 7 6b 

Asn His Phe Glu Arg Gly Glu Asn Thr Lys Thr Pro lie Gin Lys His 



775 



780 



15 Trp Arg Pro Thr Val Leu Thr Glu Gly Pro Pro Val Lys lie Arg He 



Glu Thr Gly Glu Trp Glu Lys Gly Trp Asn Val Leu Val Trp Gly Arg 

805 BIO 815 

Gly Tyr Ala Ala Val Lys Asn Arg Asp Thr Asp Lys Val He Trp Val 

820 825 830 



<210> 6 
<211> 1734 
<:212> DNA 

25 <213.- myeloblastosis-associated virus 
<220> 

<223> alpha coding region (no met, no tag, no stop) 
<<100> 6 

actgttgcgc tacatctggc tattccgctc aaatggaagc caaaccacac qcctgtgtgg 
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attgaccagt ggccccttcc tqaaggtaaa cttgt.agcqc taacgcaatL agtggaaaaa 1 ?. 0 
gaattacagt tagqacatai agaaccttca cttagttgct gqaacacacc tgtctttatg 180 
atccggaagg cttccgggtc ttatcgctta ttgcatgact tgcgcgctgt taacgctaag 2-10 
cttgttcctt ttggggccgt ccaacagggg gcgccggttc tctccgcgct cccgcgtggt 300 . 
5 tggcccctga tggtcctaqa cctcaaggat tgcttctttt ctattcctct tgcggaacaa 360 
gatcgcqaac gttttgcatt tacgctcccc tccgtgaata accaggcccc cgctcgaagg 420 
ttccaatgga aggtcttgcc ccaagggatg acctgttctc ccactatctg tcagttgata 480 
gtgggtcaaa tacttgagcc cttgcgactc aagcacccat ctctgcgcat gttgcattat 540 
atggatgatc ttttgctagc cgcctcaagt catgatgggt tggaagcggc aggggaggag 600 
10 gttatcagta cattggaaag agccgggttc accatttcgc ctgataaggt ccagagggag 660 
cccggagtac aatatcttgg gtacaagtta ggtagtacgt atgtagcacc cgtaggcctg 720 
gtagcagaac ccaggatagc caccttgtgg gatgttcaga agctggtggg gtcacttcag 780 
tggcttcgcc cagcgttagg aatcccgcca cgactgatgg gcccctttta tgagcagtta 840 
cgagggtcag atcctaacga ggcgagggaa tggaatctag acatgaaaac ggcctggaga 900 
15 gagatcgtac agctcagcac cactgctgcc ttggagcgat gggaccctgc cctgcctctg 960 
gaaggagcgg tcgct.agatg tgaacagggg gcaatagggg tcctgggaca gggactgtcc 1020 
acacacccaa ggccatgttt gtggctattc tccacccaac ccaccaaggc gtttactgct 1080 
tggttagaag tgctcaccct tttgattact aagctacgtg cttcggcagt: gcgaaccttt 1140 
ggcaaggagg ttgatatcct cctgttgcct gcatgctttc gggaggacct tccgctcccg 1200 
20 gaggggatcc tgttagccct taaggggttt gcaggaaaaa tcaggagtag tgacacgcca 1260 
tctatttttg acattgcgcg tccactgcat gtttctctga aagtgagggt taccgaccac 1320 
cctgtgccgg gacccactgt ctttactgac gcctcctcaa gcacccataa gggggtggta 1380 
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qqqcccaaq qtqggagat a aaagaaatag ctgatttggq ggcaagtgta 1 -1 '1 0 
,, ,, -,-,,-q,- . fitqqccitq gcacttctgc tqtqqccqac aacgcccacL 1500 
, . qt.ttgttgcg aaaatgttac tcaagatggg acaggaggga 1560 



qtcocgic'a -aycggcgqc tULatttta gaggatgcgt taaqccaaag gtcagccatg J620 
gccgccgttc t r ca rq t c cc, gaqtcattct gaagtgccag ggtttttcac agaaggaaat 1680 
ga.-qtggcaq ,,t a ; • aa i c,,r X 't ttcaa gcgtatccct tgagagaqgc taaa 1734 



<210> 7 

< 2 1 1 - 17 3 7 

•.2 : :• • UNA 

<213> myclobla:;lo:.is .i^ociated virus 



<223> alpha coding reqion (no met, no tag, stop) 

<400> 7 

actgttgcgc tacatctgqc tattccgctc aaatggaagc caaaccacac gcctgtgtgg 60 
attgaccagt. ggccccttcc tgaaggtaaa cttgtagcgc taacgcaatt agtggaaaaa 120 
yaauacagt taggacatat. aqaaccttca cttagttgct ggaacacacc tgtctttgt.g 180 
atccggaagg cttccgggtc ttatcgctta ttgcatgact tgcgcgctgt taacgctaag 240 
cttgttcctt ttggggccgt ccaacaggqg gcgccggttc tctccgcgct cccgcgtggt 300 
tggcccctga tggtcctaya cclcaaqgat tgcttctttt ctattcctct tgcggaacaa 360 
qatxqcgaac gttttgcutt tacqctcccc tccgtgaata accaggcccc cgctcgaagg 420 
tf.ccaatgqa aggtcttgcc ccaagggatq acctgttctc ccactatctg tcagttgata 480 
qtqggt.caaa tacttgagcc cttgcqactc aagcacccat ctctgcqcat gttgcattat 540 
atggatgatc ttttgctagc cgcctcaagl catgatgygt tggaagcggc aggggaggag 600 
gttatcagta cattgqaaag agccqggttc acxatttcgc ctgataaqqt ccagagggag 660 
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cccggagtac aatatcttgg gtacaagtUi ygtagtacgt atgtagcacc cgtaqqcctg 720 
ataqcagaac ccaggataqc caccttgt.ga gatqttcaga agctggtggq qtcactr.cag 780 
t.gg -Ucgcc cagcgttagg aatcccgcca cgactgatgg gccccttUa tgagcagtta 840 
cgagggtcag atcctaacga ggcgagggaa tggaatctag acatgaaaat ggcctggaga 900 ■ 
gagatcgtac aqctcagcac cactgctgcc ttggagcgat gggaccctgc cctgcctctg 960 
gaaggagcgg tcgctagatg tgaacagggg gcaatagggg tcctgggaca gggactgt.cc 1020 
acacacccaa ggccatgttt gtggctattc tccacccaac ccaccaaggc gtttactgct 1080 
tggttagaag tgctcaccct tttgattact aagctacgtg cttcggcagt gcgaaccttt 1140 
ggcaaggagg ttgatatcct cctgttgcct gcatgctttc gggaggacct tccgctcccg 1200 
gaggggatcc tgttagccct taaggggttt gcaggaaaaa tcaggagtag tgacacgcca 1260 
tctatttttg acattgcgcg tccactgcat gtttctctga aagtgagggt taccgaccac 1320 
cctgtgccgg gacccactgt ctttactgac gcctcctcaa gcacccataa gggggtggta 1380 
gtctggaggg agggcccaag gtgggagata aaagaaatag ctgatttggg ggcaagtgta 1440 
caacaactgg aagcacgcgc tgtggccatg gcacttct.gc tgtggccgac aacgcccact 1500 
aatgtagtga ctgactccgc gtttgttgcg aaaatgttac tcaagatggg acaggaggga 1560 
gtcccgtcta cagcggcggc ttttatttta gaggatgcgt taagccaaag gtcagccatg 1620 
gccgccgttc tccacgtgcg gagtcattct gaagtgccag ggtttttcac agaaggaaat 1680 
gacgtggcag atagccaagc cacctttcaa gcgtatccct tgagagaggc taaataa 1737 



<210> 8 

<211> 2496 

<212> DNA 

<213> myeloblastosis-associated 



<223> beta coding region (no met, no tag, no stop) 
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•;400.- 6 

actgutgcgc tacatctggc tattccgct;: aaatggaagc caaaccacac gcctgrgtgg GO 
,,-qacwgt ggcccctlcc Lgaaggcaaa cttgtagcgc Uacgcaiif.t agtggaaaaa 120 
gaattacagt taggacatat agaacctcca ctcagttgct ggaacacacc tgtctctgtg 180 
5 atccggaagg cttccgggtc tcatcgctta ttqcatgact tgcgcgctgt taacgctaag 240 
cttqttcctt ttggggccgt ccaacagggg gcgccggttc tctccgcgct cccgcgtggt 300 
tggcccctga tggtcctaga cctcaaggat tgctt.ctttt ctattcctct tgcggaacaa 360 
gatcgcgaac gttttgcatt tacgctcccc tccgtgaata accaggcccc cgctcgaagg 420 
ttccaatgga aggtcttgcc ccaagggatg acctgttctc ccactatctg tcagttgata 480 
10 gtgggtcaaa tacttgagcc cttgcgactc aagcacccat ctctgcqcat gttgcattat 540 
atggatgatc ttttgctagc cgcctcaagt catgatgggt tggaagcggc aggggaggag 600 
gttatcagta cattggaaag agccgggttc accatttcgc ctgataaggt ccagagggag 660 
cccggagtac aatatcttgg gtacaagtta ggtagtacgt atgtagcacc cgtaggcctg 720 
gtagcagaac ccaggatagc caccttgtgg gatgttcaga agctggtggg gtcacttcag 780 
15 tggctticgcc cagcgttagg aatcccqcca cgactgatgg gcccctttta tgagcagtta 840 
cgagggtcag atcctaacga gqcgagggaa tggaatctag acatgaaaat ggcctggaga 900 
gagalcgtac agctcagcac cactgctgcc ttggagcgat gggaccctgc cctgcctctg 960 
gaaggagcgg tcgctagatg tgaacagggg gcaatagggg tcctgggaca gggact-gt — i 
acacacccaa gqccatgttt gtggctat-c tccacccaac ccaccaaggc gtttactgct 1080 
20 tggrtagaag tgctcaccct tt.t.qattact aaqctacgtg ctLcggcagt gcgaaccttt 1140 
ggcaaggagg t-gatatcct cctgttgcct gcatqctttc gggaggacct tccgc-cccg 1200 
gaggggatcc tgttagccct tzaaggggt t gcaggaaaaa LcaggagLag tgacacgcca 1260 
tctatttttg acattgcgcg tccactgcaL gtttctctqa aagtgagggL taccgaccac 1320 
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ql .-t q'l-.gq i .1-1 t i caq qtgggaqaU 
cddca,icU];| -lag-'-i.-qcqc Lgtggccatg 
aatqtdtjtqLi ctqacLccgc gtttgttgcg 
gtcccgtcta c.uirqqcqgc ttttatttta 
gccgccgttc t ;\i • n t g eg gagtcattct 
qacqtggcaq a t a i 'uaq'.: cacctttcaa 
cataccgctc tcr.itatlqq accccgcgcg 
caggctaggg aggttqttca gaccLgcccq 
ggggtaaacc ct.acjgggttt gggaccccta 
cctagaatgg ccccccgtLc ctggctcgct 
gtegtaaetc agcaLggccg tgtcacatcg 
atcgccgttt tgggaagacc aaaggecata 
aaatccacgc gagagtggct cgcgagatgg 
aattcccagg gtcaagctat ggtagagegg 
gtgettgegg agggggatgg ctttatgaaa 
ttagecaagg caatgtatgc cctcaat. cac 
atacaaaaac actggagacc taccgttctc 
gagacagggq agtgggaaaa aggatggaac 
gtgaaaaaca gggacactga taaggttatt 

<210> 9 
<211> 2199 
<212> DNA 
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gcctc;:tcaa gcacccataa gggggtqgta 1380 
aaagaaatag ctgatttggg gqcaagtgta 1440 
gcacttctgc tgtggccgac aacgcccact 1500 
aaaatgttac tcaagatggg acaggaggga 1560. 
gaggatgegt taagccaaag gtcagccatg 1620 
gaagtgccag ggtttttcac agaaggaaat 1680 
gcgtatccct tgagagaggc taaaqatctc 1740 
ctatccaaag cgtgtaatat atetatgeag 1800 
cattgtaatt cagcccctgc gttggaggcc 1860 
cagatatggc agacagactt tacacttgag 1920 
gttactgtgg ataccgcctc ateggegata 1980 
gttgetgeae aacatcattg ggccacggct 2040 
aaaacagata atgggtcctg cttcacgtct 2100 
gggatagcac acaccaccgg gattcegggt 2160 
gccaaccggc tcctgaaaga taagatccgt 2220 
agaatcccca ccagcaaaca ggggqaacta 2280 
tttgagcgtg gtgaaaacac aaaaacaccg 2340 
acagaaggac ccccggttaa aatacgaata 2400 
gtgctggtct ggggacgagg ttatgecget 2460 
tgggta 2496 



PCT/USOO/00896 



iohlasiosis -associated vi 



... • yrqcqc tacatctggc tactccgctc aaatggaagc caaaccacac gcctgtgtgg 60 
a:t,M._-cac,t ggccccttcc tgaaggtaaa cttgtagcgc taacgeaatt agtggaaaaa 120 
;liJ .,, 1 - J]; taqgacatat agaaccttca cttagttgct ggaacacacc tgtctttgtg 180 
„t qquaqq cttccqqqrc ttatcgctta ttgcatgact tgcgcgctgt taacgctaag 240 
t r t -t t tLqqqrjccqt ccaacagggg gcgccggttc tctccgcgct cccgcgtgqt 300 
tqg,rcctga tygtcct.aqa cctcaaggat tgcttctttt ctattcctct tgcggaacaa 360 
qa* .-qr;q f ,ac qttttqcatt tacgctcccc tccgtgaata accaggcccc cgctcgaagg 420 
tt-aatqqa aygtcttqcc ccaagggatg acctgttctc ccactatctg tcagttgata 480 
qtq.Mtcaaa tacttgagcc cttgcgactc aagcacccat ctctgcgcat gttgcattat 540 
atq,..tgaLc ttttgctagr cgcctcaagt catgatgggt tggaagcggc aggggaggag 600 
q't.itoiqtri cattqgaaag agccgggttc accatttcgc ctgataaggt ccagagggag 660 

- • ,, ..,qt aalatc-ttgg gucaagtta ggtagtacgt atgtagcacc cgtaggcctg 720 
jt ., i.-aqaa: ccaqqatagc caccttgtgg gatgttcaga agctggtggg gtcacttcag 780 
. , . .. . . caqcqtt.a-jq aat.cccgcca cgactqatgg qcccctttta tgagcagtta 840 

- , , : , it -aq atcrtaacga qqcgaqggaa tggaatctag acatgaaaaL gqcctggaga 900 
-la.Mt -qtac agctoagtMC cactgctgcc ttggagcgat gggaccctgc cctgcctctg 960 
q....'rj.iq.-qg tcgcta q a t. q tqaacagggg gcaatagggg tcctgggaca gggactgtcc 1020 
acdccccd,! ggccatqrtt gtggctattc tccacccaac ccaccaaqqc qtttactgct 1080 
tqqctagaag tgctcacccf tttqatiact aagct.acgtg cttcggcagt gcgaaccttt 1140 
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ggcaaggagg ttgaLatcct cctgLtgcct gcatgctttc gggaggacct i.ccgctcccg 1200 
qaqqgqar cc tattagc-ct taagggqttt gcaggaaaaa tcaggagtag tqacacgcca .U oO 
tctatttttg acattgcgcg tccactgcaL gtttctctga aagtgagggt taccgaccac 1320 
cctglgccgg qacccactgt ctttactgac gcctcctcaa gcacccataa gggggtggta 13H0- 
5 gtctggaggq agggcccaag qtgggagata aaagaaatag ctgatttggg ggcaagtgta 1440 
caacaactgg aagcacgcgc tgtggccatg gcacttctgc tgtggccgac aacgcccact 1500 
aatgtagtga ctgactccgc gtttgttgcg aaaatgttac tcaagatggg acaggaggga 1560 
gtcccgtcta cagcggcggc ttttntttta gaggatgcgt taagccaaag gtcagccatg 1620 
gccgccgttc Lccacgtgcg gagtcattct gaagtgccag ggtttttcac agaaggaaat 1680 
10 gacgtggcag atagccaagc cacctttcaa gcgtatccct tgagagaggc taaagatctc 1740 
cataccgctc tccatattgg accccgcgcg ctatccaaag cgtgtaatat atctatgcag 1800 
caggctaggg aggttgttca gacctgcccg cattgtaatt cagcccctgc gttggaggcc 1860 
ggggtaaacc ctaggggttt gggaccccta cagatatggc agacagact.t tacacttgag 1920 
cctagaatgg ccccccgttc ctggctcgct gttactgtgg ataccgcctc atcggcgata 1980 
15 gtcgtaactc agcatggccg tgtcacatcg gttgctgcac aacatcattg ggccacggct 2040 
atcgccgttt tgggaagacc aaaggccata aaaacagata atgggtcctg cttcacgtct 2100 
aaatccacgc gagaqUggct cgcgagatgg gggatagcac acaccaccgg gattccgggt 2160 
aattcccagg gtcaaqctat ggtagagcgg gccaaccggc tcctgaaaga taagatccgt 2220 
gtgctLgcgg agggggatgg ctttatgaaa agaatcccca ccagcaaaca gggggaacta 2280 
20 ttagccaagg caatgtatgc cctcaatcac tttgagcgtg gtgaaaacac aaaaacaccg 2340 
atacaaaaac actggagacc taccgttctt acagaaggac ccccggtt.aa aatacgaata 2400 
gagacagqqg agtgggaaaa aggatggaac gtgctggt.ct ggggacgagq ttatgccgct 2460 
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qtgaaaaaca gggac, 



- 29 - 



tgggtataa 
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-:2 11> 2 68 8 
<:212> DNA 

5 <:2J3> myeloblastosi s- associated virus 

•-223> full-length coding region (no met, no tag, stop) 
■ : 4 0 0 > 10 

actgttgcgc tacatctggc tattccgctc aaatggaagc caaaccacac gcctgtgtgg 60 
10 attgaccagt ggccccttcc t.gaaggtaaa cttqtagcgc taacgcaatt agtggaaaaa 120 
qaattacagt taqgacatat agaaccttca cttagttgct ggaacacacc tgtctttgtg 180 
atccggaagg cttccgggtc ttatcgctta ttgcatgact tgcgcgctgt taacgcxaag 240 
cttgttcctt ttggggccgt ccaacagggg gcgccggttc tctccgcgct cccgcgtggt 300 
tggcccctga tggtcctaga cctcaaggat tgcttctttt ctattcctct tgcggaacaa 360 
15 gatcgcgaac gttttgcatt tacgctcccc tccgtgaata accaggcccc cgctcgaagg 420 
ttccaatgga aggtottgcc ccaaggqatg acctgttctc ccactatctg tcagttgata 4 8 0 
gtgggtcaaa tacttgagcc cr.tgcgactc aagcacccat ctctgcgcat gttgcattat 540 
atggatgatc ttttgctagc cgcctcaagt catgatgggt tggaagcggc aggggaggag 600 
qttatcaqta cattggaaag agccgggttc accatttcgc ctgataaggt ccagagggag 660 
20 rrcggagtac aatatcttgg gtacaagtta ggtagtacgt atgtagcacc cgtaggcctg 720 
gt.igcagaac cragqatagc caccttgtgg gatgLtcaga aqctggtggg gtcacttcag 780 
tggcttcgcc cagcgttagg aatcccgcca cgactgatgg geccctttta tgagcaqtta 840 
cgagggtcag atcctaacga ggcgagggaa tgqaatctag acatgaaaat ggcctggaga 900 
gagatcgtac agctcagcac cactgctgc: ttggagcgat ggqaccctgc cctqcotctg 960 
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r ,. 1 , J , hi ,. iqraf , tcqctagatq tgaacagggg gcaatagggg tccLgggaca gggactgtcc 1020 
h.m -irurccitottt gt.qgctattc t.ccacccaac cciccuaggc gttfactgct 10S0 

,.,,, il1dliq tqctcaccct tttqattact aagctacgtg cttcggcagL gcgaaccttt 1L4C) 
g ; , ■., .qq.Hu: ttgatatcct cctgttgcct gcatgctttc gggaggacct LccgcLcccg 1200 
ai , ;|qT]aU; r tgttagccct taagggqttt gcaggaaaaa tcaggagtag tgacacgcca 1260 
t-s.itttttcj dcaugcqcg tccactgcat gtttctctga aagtgagggt taccgaccac 1320 
• ... .,; , - -q., gacccactgt ctttactgac gcct.cctcaa gcacccataa gggggtggta 1380 
gt,:- t ngaggg agggrccaag gtgggagata aaagaaatag ctgatttggg gqcaagtgta 1440 
r-oaccactgg uagcacgcgc tgtggccatg gcacttctgc tgtggccgac aacgcccact 1500 
aatgtaqtga ctgactccgc gtttgttgcg aaaatgttac tcaagatggg acaggaggga 1560 
qtcr-cgtcta oagcggcggc ttttatttta gaggatgcgt taagccaaag gtcagccatg 1620 
go-:c,ccgttc t ccacgtgcg gagtcattct gaagtgccag ggtttttcac agaaggaaat 1680 
ga-qtggcag atagccaagc cacctttcaa gcgtatccct tgagagaggc taaagatctc 1740 
r . ■ tccatattgg accccgcgcg ctatccaaag cgLgtaatat atctatgcag 1800 
c.Kjg-taqgg aggttgttca gacctgcccg cattgtaatt cagcccctgc gttggaggcc 1860 
( n T i;,,ucf ctaggggttt gggaccccta cagatatggc agacagactt tacacttgag 1920 
r t aga.agg ccccccgLtc ctggctcgct gttactgtgg ataccgcctc atcggcgata 1980 
gl-qtuacLc aqcdtggccg tgtcacatcg gttgctgcac aacatcattg ggccacggct 2040 
at'/n-cgttt tgggaagacc aaagqccata aaaacagata atgggtcct.g cttcacgtct 2100 
aaciit'-acqr gagagtggct cgcgagatgg gggatagcac acaccaccgg gattccgggt 2160 
acttcccagg gtcaagctat ggtagagcgg gccaaccggc tcctgaaaga taagatccgt 2220 
gtgcr.Lqcgg agggggatgg ctttatgaaa agaatcccca ccagcaaaca gggggaacta 2280 
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Ltaq.-caaqa caatgtatqc cctcaatcac ttt.gagcgtq acgaaaacac aaaaacaccg 2340 
atacaaaaac a :T aaaga cc taccgttctt acaqaaggac ccccggttaa aatacgaata 24 00 
qagacagggg agtgggaaaa aggatggaac gtgctggtct ggggacgaqg ttatgccgct 2400 
gtgaaaaaca ggoacactga taaggttatt tgggtaccct ctcgaaaagt taaaccggac 2520 ■ 
5 atcacccaaa agaatgaggt gactaagaaa gatgaggcga gccctctt.Lt tgcaggcatt 2580 
tctg.ctggg cgccctggga aggcgagcaa gaaggactcc aagaagaaac cgccagcaac 2640 
aagcaagaaa gacccggaqa agacacccct gctgccaacg agagttaa 2688 

■;: !(,.• 11 

<211> 2691 

10 <212.> DNA 

•:213- myeloblastosis-associ.ated virus 

<22 0> 

<223> full-length coding region (met, no tag, stop) 
<4 00> 11 

15 atgactgttg cgctacatct ggctattccg ctcaaatgga agccaaacca cacgcctgtg 60 
tggattgacc agtggcccct tcctgaaggt aaacttgtag cgctaacgca attagtggaa 120 
aaagaattac agttaggaca tatagaacct tcacttagtt. gctggaacac acctgtcttt 180 
gtgatccgga aggcttccgg gtcttatcgc ttattgcatg acttgcgcgc tgttaacgct 240 
aagcttgttc cttttggggc cgtccaacag gqggcgccgg ttctctccgc gctcccgcgt 300 
20 ggttggcccc tgat.ggtcct agacctcaag gattgcttct tttctattcc tctt.gcggaa 360 
caagatcgcg aacgttttgc atttacgctc ccctccgtga ataaccaggc ccccgctcga 420 
aggttccaat ggaaggtctt qccccaaggg atgacctgtt ctcccactat ctgtcagttg 480 
atagtgggtc aaatactt.ga gcccttgcga ccaagcacc catctctgcg catgttgcat 540 
tatatggatg atcrtttgct agccgcctca agtcatgatg ggttggaagc ggcaggggag 600 
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qaacccggucj tacaatatc! t.qqqcacaa:; 
ctqqugcdq aacccagqat aqccacctt 3 
caqlqgctt.c gcccagcgtt aggaatrc-q 
ttacgagggt cagatcctaa cgaggcgagg 
agagagatcg tacagctcag caccactgct 
ctggaaggag cggtcgctag atgtgaacag 
tccacacacc caaggccatg tttgtggcta 
gcttggttag aagtgctcac ccttttgatt 
tttggcaagg aggttgatat cctcctgttg 
ccggagggga tcctgttagc cctLaagggg 
ccatctattt ttgacattgc gcgtccactg 
caccctgtgc cgggacccac tgtctttact 
gtagtctgga gggagggccc aaggtgggag 
gtacaacaac tggaagcacg cgctgtggcc: 
actaatgtag tgnctgactc cgcgtttgtt 
ggagtcccgt ctacagcggc ggcttttatr 
atggccgccg ttctccacgt gcggagtcat 
aatgacglgg cagataqcca agccaccttt 
ctccataccg ctctccatat tggaccccgc 
cagcaggcta gggaggttgt tcagacctgc 
gccggggtaa accctagggq tttqggaccc 
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ttcaccattt cgcrtqataa gqtccagagg 660 
ttaggtagta cgtatgtagc acccqtaggc 72 0 
tggqatgttc agaaqctqgt ggggtcactt 780 
ccacgactga tggqcccctt ttatgagcag 840 . 
gaatggaatc tagacatgaa aatggcctgg 900 
gccttggagc gatgggaccc tgccctgcct 960 
ggggcaatag gggtcctggg acagggactg 1020 
ttctccaccc aacccaccaa ggcgtttact 1080 
actaagctac gtgcttcggc agtgcgaacc 1140 
cctgcatgct ttcgggagga ccttccgctc 1200 
tttgcaggaa aaatcaggag tagtgacacg 1260 
catgtttctc tgaaagtgag gqttaccgac 1320 
gacgcctcct caagcaccca taagggggtg 1380 
ataaaagaaa tagctgattt. gggggcaagt 1440 
atggcacttc tgctgtggcc gacaacgccc 1500 
gcgaaaatgt tactcaagat gggacaggag 1560 
ttagaggatg cgttaagcca aaggtcagcc 1620 
tctgaagtgc cagggttttt cacagaagga 1680 
caagcgtatc ccttgaqaga ggctaaagat 1740 
gcgctatcca aagcgLgtaa tatatctatg 1800 
ccgcattgta attcaqcccc tgcgttggag 1860 
ctacagatat gqcagacaga ctttacactt 1920 
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qagcctaqaa tqycccc 
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gctgtt.actg 
i tcggttgctg 



tgqataccqc ct<_. 



tcyy-g 1? 



10 




<210> 12 
15 <211> 2499 
<212> DNA 

<213> myelobJastosis-associated virus 



<223> beta coding region (met, no tag, no stop) 



20 <<100> 12 

atgactgttg cgctacatct ggctattccg ctcaaatgga agccaaacca cacgcctgtg 



60 



tggattgacc agtggcccct tcctgaaggt aaacttgtag cgctaacgca attagtggaa 120 
aaagaattac aqttaggaca tatagaacct tc.cttagtt gctggaacac acctgtcttt 180 
gtgatccgga aggcttccgq qtcttatcgc ttattgcatg acttycgcqc tqttaacgct 2.0 
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u. q cttqttc cr.tttggggc cgtcccug ggggcgccgg ttcwtccqc gctcccgcgt 300 
ogttgacccc tgatggtcct aaacctcaag q -tt,cttct tttctatr.cc tcttgcggaa 3G0 
caanatcg<-q aacgttttgc: atttacgctc ccctccgtga ataaccaggc ccccgctcga 420 



agqtt 



ggaaggtctt gccccaaggg atgacctgtt ctcccactat ctgtcagttg 480 - 
5 atagtgggtc aaatacttga gcccttgcga ctcaagcacc catctctgcg catgttgcat 540 
tatatggatg atcttttgct agccgcctca agtcatgatg ggtt.ggaagc ggcaggggag 600 
gaggttatca gtacattgga aagagccggg ttcaccattt cgcctgataa ggtccagagg 660 
gagcccggag tacaatatct tgggtacaag ttaggtagta cgtatgtagc acccgtaggc 720 
ctggtagcag aacccaggat agccaccttg tgggatgttc agaagctggt ggggtcactt 780 
10 cagtggcttc gcccagcgtt aggaatcccg ccacgactga tgggcccctt ttatgagcag 840 
ttacgagggt cagatcctaa cgaggcgagg gaatggaatc tagacatgaa aatggcctgg 900 
agagagatcg tacagctcag caccactgct gccttggagc gatgggaccc tgccctgcct 960 
ctggaaggag cggtcgctag atgtgaacag ggggcaatag gggtcctggg acagggactg 1020 
tccacacacc caaggccatg tttgtggcta ttctccaccc aacccaccaa ggcgtttact 1080 
15 gcttggttag aagtgctcac ccttttgatt actaagctac gtgcttcggc agtgcgaacc 1140 
tttggcaagg aggttgatat cctcctgttg cctgcatgct ttcgggagga ccttccgctc 1200 
ccggagggaa tcctgttaqc ccttaagggg tttgcaggaa aaatcaggaa tagtgacacg 1260 
ccatctattt ttgacattgc gcgtccactg catgtttctc tgaaagtgag ggttaccgac 1320 
c:accctgtqc cgggacccac tgtctttact gacgcctcct caagcaccca taagggggtg 1380 
20 gtagcctgga gggagggccc aaggtgggag ataaaagaaa tagctgattt gggggcaagt 1440 
gtacaacaac tggaagcacg cgctgtggcc atggcacttc tgctgtggcc: qacaacgccc 1500 
actaatgtag tgactgactc cgcgtttgtt gcgaaaatgt tactcaagat gggacaggag 1560 
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( , rKI(ltc .,, :t ... t . I , a ,,,. f ,jc ggcttttatr. ttaqaggatg cgttnagcca aaggtcagcc 162 0 
atqn<T , fJ ,,, t ,,-,t gcggagtcat tctgaagtgc cagggttttt cacagaagga 1680 
aatgacgtqq r, t „f,,-_-a agccaccttt caagcgtatc ccttgagaga ggctaaagat 1 1 <) U 
ctccataccy c-tctccataL tggaccccgc gcgctatcca aagcgtgtaa tatatctatg 1800 . 
5 cag.aggcta q.qagqttgt tcagacctgc ccgcattgta attcagcccc tgcgttggag 1860 
gccggggtaa a- rau,H "tgggaccc ctacagatat ggcagacaga ctttacactt 1920 
gagcctagaa t gg , :. a ttcctggctc gctgttactg tggataccgc ctcatcggcg 1980 
atagtcgtaa c,,:.,,:^:, ccgtgtcaca tcggttgctg cacaacatca ttgggccacg 2040 
gctatcgccg ttttgggaag accaaaggcc ataaaaacag ataatgggtc ctgcttcacg 2100 
10 tctaaatcca cqcqagagtg gctcgcgaga tgggggatag cacacaccac cgggattccg 2160 
ggtaattccc agggtcaagc tatggtagag cgggccaacc ggctcctgaa agataagatc 2220 
cgtgtgcttg cggaggggqa tggctttatg aaaagaatcc ccaccagcaa acagggggaa 2280 
ctattagcca aggcaatgta tgccctcaat cactttgagc gtggtgaaaa cacaaaaaca 2340 
ccgatacaaa aacactggag acciaccgtt cttacagaag gacccccggt taaaatacga 2400 
15 atagagacag gggagtggga aaaaggatgg aacgtgctgg tctggggacg aggttatgcc 2460 
gctgtgaaaa acagggacac tgataaggtt atttgggta 



<210> 13 
<21J> 1737 
<212> DNA 

20 <213> rayeloblastosis-associated virus 
<220> 

<223> alpha coding rcqion (met, no tag, no stop) 
<400> 13 

atgactgttg cgctacatct ggctattccg ctcaaatgga agccaaacca cacgcctgtg b 0 
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tgqactqacc agtgqcccct tcctqaaggt aaacttgtaq cgctaacgca attagtgqaa 120 
a<naaattac aatlaqaaca Latanaacct tcacttaqtl gctgqaacac acctqtcttt 180 
gtgatocgga aqqcttccgg gtcttatcqc ttattgcatg acttgcgcgc tgttaacgct 24 0 
aagcttgttc cttttgggqc cgtccaacag ggggcgccgg ttctctccgc gctcccgcqt 300 
5 qgttggcccc t:gatggtcct agacctcaag gattgcttct tttctattcc tiottgcggaa 360 
caagatcgcg aacgttttgc atttacgctc ccctccgtga ataaccaggc ccccgctcga 420 
agguccaat ggaaggtctt gccccaaggg atgacctgtt ctcccactat ctgtcagttg 480 
atagt.gggtc aaatacttga gcccttgcga ctcaagcacc catctctgcg catgttgcat 540 
tatatggatg atcttttgct agccgcctca agtcatgatg ggttggaagc ggcaggggag 600 
10 gaggttatca gtacattgga aagagccggg ttcaccattt cgcctgataa ggtccagagg 660 
gagcccggag tacaatatct tgggtacaag ttaggtagta cgtatgtagc acccgtaggc 720 
ctggtagcaq aacccaggat aqccaccttg tgggatgttc agaagctggt ggggtcactt 
cagtggcttc gcccagcgtt aggaatcccg ccacgactga tgqgcccctL ttatgagcag 
ttacgagggt cagatcctaa cgaggcgagg gaatggaatc tagacatgaa aatggcctgg 900 
15 agagagatcg tacagctcag caccactgct gccttggagc gatgggaccc tgccctgcct 960 
ctggaaggag cggtcgctag atgtgaacag ggggcaatag gggtcctggg acagggactg 1020 
tccacacacc caaggccatg tttgtggcta ttctccaccc aacccaccaa ggcgtttact 1080 
gcttggttag aagtgctcac ccttttgatt actaagctac gtgcttcggc agtqcgaacc 1140 
tctggcaagg aggttgataL cctcctgttg cctgcatgct tLcgggagga ccLtccgctc 1200 
20 ccgqagggga tcctgttagc cctLaagggg tttgcaggaa aaatcaggag tagtgacacg 12o0 
ccdtctattt tLgacattqc gcgtccactu catgtttctc tqaaagcgag ggttaccgac 1320 
caccctgtgc cgggaccr.ac tgtctttact gacgcctcct caagcaccca taagggggtg 1300 



780 



840 
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gtagtctgga gggaggqccc aaggtgggag ataaaaqaaa Ugct 
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t.tt ggqqqcaagt 1440 



:acttc tqctgtqgcc 



gccc 1500 



actaatgtag tgactgacLc cgcgtttgtt gcgaaaatgt tactcaaqat gggacagqag 1560 

ggaqtcccgt ctac:agcggc qgcttttatt ttagaggatg cgttaagcca aaggtcagcc 1620 . 

5 atggccgccg ttctccacgt gcggagtcat tctgaagtgc cagggttttt cacagaagga 1680 

aatgacgtgg cagatagcca agccaccttl caagcgtatc ccttgagaga gqctaaa 1737 

<210> 14 
<:211> 2706 
<212> DNA 

10 <213> myeloblastosis-associated virus 
<220> 

<223> full-length coding region (met, his tag, no stop) 
<400> 14 

atgactgttg cgctacatct ggctattccg ctcaaatgga agccaaacca cacgcctgtg 60 
15 tggattgacc agtggcccct tcctgaaggt aaacttgtag cgctaacgca attagtggaa 120 
aaagaattac agttaggaca tatagaacct tcacttagtt gctggaacac acctgtcttt 180 
gtgatccgga aggcttccgg gtcttatcgc ttattgcatg acttgcgcgc tgttaacgct 240 
aagcttgttc cttttggggc cgtccaacag ggggcgccgg ttctctccgc gctcccgcgt 300 
ggttggcccc tqatggtcct agacctcaag gattgcttct tttctattcc tcttgcggaa 360 
20 caagatcgcg aacgttttgc atttacgcLc ccctccgtga ataaccaggc ccccgctcga 420 
aqgttccaat ggaaggtctt gccccaaggg atgacctgtt ctcccactat ctgtcagttg 400 
atagtgggtc aaatacttga gcccttgcga ctcaagcacc catctctgcg catgttgcat S40 
tatatggatg atcttttgct agccgcctca ugtcatgatg ggt tggaagc ggcaggggag 600 
gagqtta-ca gtacattgqa aagagccggg ttcaccattt cgcctqataa gqtccagagg 660 
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q„q.-..-.~c;qaq tdjaatdtcr. Lgqqtacaag ttaggtaqta cgtatgtagc acccgtaggc 720 
ctqqt.iq-u.> a., -..a -ja t aqccaccttg tqggatgttc agaagctgqt ggggtcactt 780 
caqtggcttc g cc: ra q,-q t t aggaatcccg ccacgactga tgggcccctt ttatgagcag 840 
ttacgagggt cagarcctaa cgaggcgagg gaatggaaLc tagacatgaa aatggcctgg 900 . 
5 agagaqatcq tac.qctccg caccactgct gccttggagc gatggqaccc tgccctgcct 960 
ctggaaggag rqqt .rt.q atqtgaacag ggggcaatag gggtcctggg acagggactg 1020 
tccacacacc c„ a g g •. a t q utgtggcta ttctccaccc aacccaccaa ggcgtttact 1080 
gcttgqttag aaqtq~tc.,r ccttttgatt actaagctac gtgcttcggc agtgcgaacc 1140 
tttggcaagg aggttgatat cctcctgttg cctgcatgct ttcgggagga ccttccgctc 1200 
10 ccggagggga tcctgtUiqc ccttaagggg tttgcaggaa aaatcaggag tagtgacacg 1260 
ccatctattt ttgacattyc gcgtccactg catgtttctc tgaaagtgag ggttaccgac 1320 
caccctgtgc cgggacccac tgtctttact gacgcctcct caagcaccca taagggggtg 1380 
gtagtctgga gggagggccc aagqtgggag ataaaagaaa tagctgattt gggggcaagt 1440 
gtacaacaac tggaagcacg cgctgtggcc atggcacttc tgctgtggcc gacaacgccc 1500 
15 actaatgtag tgactgactc cgcgtttgtt gcgaaaatgt tactcaagat gggacaggag 1560 
ggagtcccgt ctacagcggc ggcttttatt ttagaggatq cgttaagcca aaggtcagcc 1620 
atggccgccg ttctccacgt gcggaqtcat tctgaagtgc cagggttttt cacagaagga 1680 
aatgacgtgg cagatagcca agccaccttt caagcgtatc ccttgagaga ggctaaagat 1740 
ctccataccg ctctccatat t.ggaccccgc gcgctatcca aagcgt.qt.aa tatatctatg 1800 
20 cagcaggcta gggaggttgt tcagacctgc ccgcattgta aUcagcccc tgcgttggag 1860 
gccggggtaa accctagggg tttqggaccc ctacagatat ggcagacaga ctttacactt 1920 
gagcctagaa tggccccceg ttcctggctc gctgttactg tggataccgc ctcatcggcg 1980 
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at .1.-1 1 -cir.ca -cagcatgq ccgtgtcaca t.cqgt tgct g cacaacaLca ttgggccacg 204U 

,-;„.,,„;-, ._-cK-qagagtg gctcgcqaga tgggggatag cacacaccac cgggattccg 2160 
qu.u.tt. — agggt.caagc tatggtagag cgggccaacc ggctcctgaa agataagatc 2220 - 
5 m V -,.-t t j cggagqggga tggctttatg aaaagaatcc ccaccagcaa acagggggaa 2280 
... agqcaatgta tgccctcaat cactttgagc gtggtgaaaa cacaaaaaca 2340 

,., J ,, lirjdJ aacactgqag acctaccgtt cttacagaag gacccccggt. taaaatacga 2400 
.,,„,., ,, ccaaqLggga aaaaggatgg aacgtgctgg tctggggacg aggttatgcc 2460 

q,tqt qa.u.a u.:agqqacac tgataaggtt atttgggtac cctctcgaaa agttaaaccg 2520 
10 qa.-.acar-c. auaaqqatga ggtgactaag aaagatgaqg cqagccctct ttttgcaggc 2580 
atttctqaqt qgqcgccctg ggaaggcgag caagaaggac tccaagaaga aaccgccagc 2640 
. aqcuag aaagacccgg agaagacacc cctgctgcca acgagagtca ccaccaccac 2700 

2706 



■■210- 1 5 

•211- 2 '> 1 7 

-. 2 1 2 • PNA 

• •;• l ■ myi'lobldstosis-associated virus 



20 



-.qtt , cgctucalct qgctattccg ctcaaatqga agccaaacca cacgcctgtg 60 
tgqattqa^c cigtggcccci. tcctgaaggt aaacttgtag cgctaacgca attagtggaa 120 
aaaqaattac agttuggaca tatagaacct tcacttagtt gctggaacac acctgtcttt 1B0 
gt.gatccgga aqgct.tccgg gtcttatcgc ttatt.gcatg acctgcgcgc tgttaacgct 240 
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aagcttgttc c 1 1<_ tggggc cqtccaacag ggggcgccqq ttccctccgc gctcccgrgt 3UU 
qattggcocc t ga t.qnt ci-t aqacctcaag gattgct.tct tttctattcc tcttqcaaaa 3(,0 
caagatcgcg aacgLtltgc atttacgctc ccctccgtga acaaccaggc ccccgctcga 4 v 0 
aggttccaat ggaaggLctt gccccaaggg atgacctgtt ctcccactat ctgtcagttg 480 - 
5 atagtgggtc aaatacttga gcccttgcga ctcaagcacc catctctgcg catgttgcat 5-10 
tatatggatg atcttttgct agccgcctca agtcatgatg ggttggaagc ggcaggggag 600 
gagqttatca gtacattgga aagagccggg ttcaccattt cgcctgataa ggtccagagg 660 
gagcccggaq tacaatatct tgggtacaag ttaggtagta cgtatgtagc acccgtaggc 720 
ctggtagcag aacccaggat agccaccttg tgggatgttc agaagctggt ggggtcactt ^60 
10 cagtggcttc gcccagcgtt aggaatcccg ccacgactga tgggcccctt ttatgagcag 840 
ttacgagggt cagatcctaa cgaggcgagg gaatggaatc tagacatgaa aatggcctgg 900 
agagagatcg tacagctcag caccactgct gccttggagc gatgggaccc tgccctgcct 960 
ctggaaggag cggtcgctag atgtgaacag ggggcaatag gggtcctggg acagggactg 1020 
tccacacacc caaggccatq tttgtggcta ttctccaccc aacccaccaa ggcqtttact 1080 
15 gcttggttag aagtgctcac ccttttgatt actaagctac gtgcttcggc agtgcgaacc 1140 
tttggcaagg aggttgatat cctcctgttg cctgcatgct ttcqggagga ccttccgctc 1200 
ccggagggga tcctgttagc ccttaagggg tttgcaggaa aaatcaggag Lagtgacacg 1260 
ccatctattt rtgacattgc gcgtccactg catgtttctc tgaaagtgag ggttaccgac 1320 
caccctgtgc cgggacccac tgtcttta =t gacgcctcct caagcaccca Laaggqggtg 1380 
20 gtagtctgga ggqagggccc aaggtgggag ataaaagaaa tagct.gattt: gggggcaagt 1440 
gtacaacaac tggaagcacq cgctgtggcc atggcacttc tqctgtggcc gacaacgccc 1L00 
actaatgtag tgactgactc cgcgtttgtr gcgaaaatgt tactcaagat gggacaggag 1560 
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qqagtcccqL ttacagcqqc qgcttttatt ttagaggatg cqttaagcca aagytcagcc 162 0 

at i:iccq~ca tt ::!-.ccac;gt acggagtcat tcLqaaqtqc caggqttttt cacaqaaqga 166' 

aatgacgtgq cagatagcca agccaccttt caaqcgtatc ccttgagaga ggctaaagat 17 4 0 

ctccataccg ctctccatat tggaccccgc gcgctatcca aagcgtgtaa tatatctatg iaon 

cag:aggcta gggaggttgt tcagacctgc ccgcattgta attcagcccc tgcqttggag 1860 

gccggggtaa accctagggg tttgggaccc ctacagatat ggcagacaga ctttacactt 1920 

gagcctaqaa tqgccccccg ttcctgqctc gctgttactg tggataccgc ctcatcggcg 1980 

atagtcgtaa ct.cagcatgg ccgtgtcaca tcggttgctg cacaacatca ttgggccacg 204 0 

gctatcgccg ttttgggaag accaaaggcc ataaaaacag ataatgggtc ctgcttcacg 2100 

tctaaatcca cgcgagagtg gctcgcgaga tgggggatag cacacaccac cgggattccg 2160 

ggtaattccc agggtcaagc tatggtagag cgggccaacc ggctcctgaa agataagatc 2220 

cgtgtgcttg cggaggggga tggctttatg aaaagaatcc ccaccagcaa acagggggaa 2280 

ctattagcca aggcaatgta tgccctcaat cactttgagc gtggtgaaaa cacaaaaaca 2340 

ccgatacaaa aacactggag acctaccgtt cttacagaag gacccccggt taaaatacga 2400 

ataqagacag gggagtggga aaaaggatgg aacgtgctgg tctggggacg aggttatgcc 2460 

gctgtgaaaa acagggacac tgataaggtt atttgggtac accaccacca ccaccac 2517 



<211^ 1755 

<2!2> DNA 

20 <213.< myelobl astocis-associated virus 
<220> 

<223 > alpha coding region (met, his tag, no stop) 



atgactgttg cgctacatct ggctattccg ctcaaatgga agccaaacca cacgcctgtg 60 



BNSDOCID. <WO O042199A1 



WO 00/42199 

t •!■ j.-i I t qa -j J >.-tqt.qqcc--'Jt tcctgaaggt 
.i.i.i'i.iar t ac .KitlayfidCj Mtaqaacct 
q'. q.U >:cqqa a g q c t t c c q q gtcttatcgc 
tgttc cLLttggguc cgtccaacaq 
qjttqqcccc tqatggtcct agacctcaag 
M.Mitcgcg aacgttttgc attLacgcLc 
j'llttci.'ddt qgaagqtctt gccccaaggg 
at.iqtqqgtc aaaracttga gcccttgcga 
tatatqqatg atctttt.gct agccgcctca 
q.vjqtutai qtacattgga aagagccggg 
qaqcccqqaq tacaatatct tgggtacaag 
ctggtagcag aacccagqat agccaccttg 
caqtggcttc gcccaqcqtt aggaatcccg 
t t,ie<|aqggl caqatcctaa cgaggcgagg 
aq.jrjaej.atcg tacagctcag caccactgct 
ct qcjaagqaq cqqtcqctag atgtgaacag 
t cicjrc caagqccatq tttgtggcta 
q'ttqqttag aagtgctrac ccttttgatt 
tttqqcaagg aggrtyatat cct.cctgttg 
ccqq.iqqgqa t cctgtLagc ccttaagggg 
cratctattt ttgacattqc gcgtccactg 
caccctgtgc cgqqacccac tqtcttr.act 
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aaacttqtag cqctaacgca attagtgqaa 120 
tcacttagtt. q ctgqaacac acctgtcttt 1 8 0 
ttattgcaLg acttgcgcgc tgttaacgcl 240 
ggggcgccgg ttctctccgc gctcccgcgL 300 
gattgcttct tttctattcc tcttgcggaa 360 
ccctccgtga ataaccaggc ccccgctcga 420 
atgacctgtt ctcccactat ctgtcagttg 480 
ctcaagcacc catctctgcg catgttgcat 540 
agtcatgatg ggttggaagc ggcaggggag 600 
ttcaccattt cgcctgataa ggtccagagg 660 
ttaggtagta cgtatgtagc acccgtaggc 720 
tgggatgttc agaagctggt ggggtcactt 780 
ccacgactga tgggcccctt ttatgagcag 840 
gaatggaatc tagacatgaa aatggcctgg 900 
gccttggagc gatgggaccc tgccctgcct 960 
ggggcaatag gggtcctggg acagggactg 1020 
ttctccaccc aacccaccaa ggcgtttact 1080 
actaagctac gtgcttcggc agtgcgaacc 1140 
cctgcatgct ttcgggagga ccttccgctc 1200 
tttgcaggaa aaatcaggag tagtqacacg 1260 
catgtttctc tqaaagtgag ggttaccgac 1320 
gacgcctcct caagcaccca taaggqggtg 1380 



*G 00/42199 _ 43 _ PCT/US00/00896 

gtaitccgqci gqgagggccc aaqgtgggag ataaaagaaa tagctg.nt.tt gggggcaagt 1 4 4 0 
utacaacaac tqqaaqcacq cqctqtqqcc atggcacttc tqctqtqqcc qacaacgccc 1500 
actaacgtag tgactgactc cgcgtttgtr gcgaaaatgt tactcaaqat gggacaggag 1560 
qgagLcccgi ctacagcggc gqcttttatt ttagaggatg cgttaagcca aaggtcagcc 1620 - 
atggccgccg ttctccacgt gcggagtcat tctgaagtgc cagggttttt cacagaagga 1680 
aatgacgtgg cagatagcca agccaccttr. caagcgtatc ccttgagaga ggctaaacac 1740 

1755 



<210> 17 
<211> 2709 
<212> DMA 

•:213> myeloblastosis-associated virus 
<220> 

<223> full-length coding region (met, his tag, stop) 
<400> 17 

atgactgttg cgctacatct ggctattccg ctcaaatgga agccaaacca cacgcctgtg 60 
tggattgacc agtggcccct tcctgaaggt aaacttgtag cqctaacgca attagtggaa 120 



aaagauttac agttaggaca tatagaacct tcacttagtt gctggaacac 



acctgtcttt 18 



gtgatccgga aggcttccgg gtcttatcgc ttattgcatg acttgcgcgc tgttaacgct 240 

aagcttgttc cttttggggc cgtccaacag ggggcgccgg ttctctccgc gctcccgcgt 300 

ggttggcccc tgatggtcct agacctcaag gattgcttct tttctattcc tcttgcggaa 360 

caagatcgcg aacgttttgc atttacgct:: ccctccgtga ataaccaggc ccccgctcga 420 

agqttccaat ggaaggtctt gccccaaggg atgacctgtt ctcccactat ctgtcagttg 480 

atagtqggtc aaatacttga gcccttgcga ctcaagcacc catctctgcg catgttgcat 540 
tatatggatg atcttttgct agccgcctca agtcatgatg ggttggaagc ggcaggggag 600 



BNSDOCID. <WO 0042199A1 



WO 00/42199 

gaggttatca atacatt.gyd aagagccggq 
aagcccggag L^aatatct tgggtacaag 
ctggtagcag aa<,-ccaggat agccaccttg 
cagtggcttc qcccagcgtl aggaatcccg 
f.tacgagggt cagatcctaa cgaggcgagg 
agagagatcg tacagctcag caccactgct 
ctggaaggag cggtcgctag atgtgaacag 
tccacacacc caaggccatg tttgtggcta 
gcttggttag aagtgctcac ccttttgatt 
tttggcaagg agqttgatat cctcctgttg 
ccggagggga tcctgttagc ccttaagggg 
ccatctattt ttgacattgc gcgtccactg 
caccctgtgc cgggacccac tgtctttact 
gtagtctgga gggagggccc aaggtgggag 
gtacaacaac tggaagcacg cgctgtggcc 
actaatgtag tgactgactc cgcgtttgtt 
ggagtcccgt ctacagcggc ggcttttatt 
atggccgccg ttctccacgt gcggagtcat 
aatgacgtgg cagatagcca agccacctCt 
ctccaLaccg ctctccatat tggaccccgc 
cagcaggcta gqgaggttqt tcagacctgc: 
gccggggtaa accctagqgg tttgggaccc 
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ttcaccattt cgcctqataa ggtccaqagg 660 
tlagqtagla cgtat.gtaqc acccataanc 720 
tgggatgttc agaagctggt ggggtcactL 7B0 
ccacgactga tgggcccctt ttatgagcag 8-10 
gaatggaatc tagacatgaa aatggcctgg 900 
gccttggagc gatgggaccc tgccctgcct 960 
ggggcaatag ggqtcctggg acagggactg 1020 
ttctccaccc aacccaccaa ggcgtttacL 1080 
actaagctac gtgcttcggc agtgcgaacc 1140 
cctgcatgct ttcgggagga ccttccgctc 1200 
tttgcaggaa aaatcaggag tagtgacacg 1260 
catgtttctc tgaaagtgag ggttaccgac 1320 
gacgcctcct caagcaccca taagggggtg 1380 
ataaaagaaa tagctgattt gggggcaagt 1440 
atggcact.tc tgctgtggcc gacaacgccc 1500 
gcgaaaatgt tactcaagat gggacaggag 1560 
ttagaggatg cgttaagcca aaggtcagcc 1620 
tctgaagtgc cagggttttt cacagaagga 1680 
caagcgtatc ccttgagaga ggctaaagat 1740 
gcgctatcca aagcgtgtaa tatatctatg J800 
ccgcattqta attcaqcccc tgcgttggag 1860 
ctacagatat ggcagacaga ctttacactt 1920 
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gagcctaqa.i tgqccccccg ttcctggctc gctgttactg tqgutaccgc ctcatcggcg 1980 
ataarcgr.aa cicagcatag ccgtgtcaca tcggttgctg cacaacatca tcgggccacq 2040 
gctatcg-jcg ttttggqaag accaaaqgcc ataaaaacag ataatgggtc ctgcttcacg 2100 
tctaaat:ca cgcgagagtg qctcgcgaga tgggggatag cacacaccac cgggattccg 2160 
ggtaattccc agggtcaagc tatggtagag cgggccaacc qgctcctqaa agataagatc 2220 
cgtgLgcttg cggaggggga tggctttatg aaaagaatcc ccaccagcaa acagggggaa 2280 
ctattagcca aggcaatgta tgccctcaat cactttgagc gtggtgaaaa cacaaaaaca 2340 
ccgatacaaa aacactggag acctaccgtt cttacagaag gacccccggt taaaatacga 2400 
atagagacag gggagtggga aaaaggatgg aacgtgctgg tctggggacg aggttatgcc 2460 
gctgtgaaaa acagggacac tgataaggtt atttgggtac cctctcgaaa agttaaaccg 2520 
gacatcaccc aaaaggatga ggtgactaag aaagatgagg cgagccctct ttttgcaggc 2b80 
atttctgact gggcgccctg ggaaggcgag caagaaggac tccaagaaga aaccgccagc 2640 
aacaagcaag aaagacccgg agaagacacc cctgctgcca acgagagtca ccaccaccac 2700 



caeca ctaa 



<210> 18 
<211> 2 52 0 
<212> DMA 

<2 13> myeloblastosis -associated virus 



22 3 > bet.a coding region (met, his tag, stop) 



2709 



<400> 18 

atgactgttg cgctacatct ggctattccg ctcaaatgga agccaaacca cacgcctgtg 60 

tggattqacc agtggcccct tcctgaaggt aaacttgtag cgctaacgca attagtggaa 120 

aaagaattac agttaggaca tataqaacct. tcacttagtt gctggaacac acctgtcttt 180 
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guqatccgqa a-qgcttccgg gtcttatcgc ttattgcat.q acttgcgcgc tgttaacgct 24 0 
aaacttqttc cf. t tqcugc cgtccaacag ggqgcqccgg ttctctccgc gctcccgcqt 300 
qgttggcccc tgatqqt.cct. agaccLcaag gattgcttct tttctattcc tcttgcggaa 360 
caagatcgcg aacgttttgc atttacgctc ccctccgLga ataaccaggc ccccgctcga 420 . 
5 aggttccaat ggaaggtctt gccccaaggg atgacctgtt ctcccactat ctgtcagttg 480 
atagtgggtc aaat.acttga gcccttgcga ctcaagcacc catctctgcg catgttgcat 540 
tatatggatg atcttttgct agccgcctca agtcatgatg ggttggaagc ggcaggggag 600 
gaggttatca gtacattgga aagagccggg ttcaccattL cgcctgataa ggtccagagg 660 
gagcccggag tacaatatct tgggtacaag ttaggtagt.a cgtatgtagc acccgtaggc 720 
10 ctggtagcag aacccaggat agccaccttg tgggatgttc agaagctggt ggggtcactt 780 
cagtggcttc gcccagcgtt aggaatcccg ccacgactga tgggcccctt ttatgagcag 840 
ttacgagggt cagatcctaa cgaggcgagg gaatggaatc tagacatgaa aatggcctgg 900 
agagagatcg tacagctcag caccactgct gccttggagc gatgggaccc tgccctgcct 960 
ctggaaggag cggtcgctag atgtgaacag ggggcaatag gggtcctggg acagggactg 1020 
15 tccacacacc caaggccatg tttgtggcta ttctccaccc aacccaccaa ggcgtttact 1080 
gcttggttag aaqtgctcac ccttttgatt actaagctac gtgct.tcggc agtgcgaacc 1140 
tttggcaagg aggttgatat cctcctgt.tg cctgcatgct tticgggagga ccttccgctc 1200 
ccggagggga tcctgttagc ccttaagggg tttgcaggaa aaatcaggag tagtgacacg 1260 
ccatctattt ttqacattqc gcgt.ccactg catgtttctx tgaaagtgag ggttaccgac 1320 
20 caccctgtgc cgggacccac tgtctttact gacgcctcct caagcaccca taagggggLg 1380 
gtagtctgqa gggagggccc aaggtgggag ataaaagaaa tagctgattt gggggcaagt J 440 
gtacaacaac tggaagcacg cgctgtggcc atggcacttc tgctguqqcc gacaacgccc 1500 
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tgactgactc cgcgtttgtt gcgai 



aagat gggacaggag 15 60 



gqagtcccgt ctacagcggc ga c 1 1 t 1 . i 1 1: ttagagqatg ogttaaqcea aaggtcagcc 1620 
atgqccgccg ttctccacgt gcggagtcat tctgaagtgc- cagggttttt cacagaagga 1680 
aatgacgtgg cagatagcca agccaccttt caagcgtatc ccttgagaga ggctaaagat 1740 . 
ctccataccg ctctccatat tggaccccgc gcgctatcca aagcgtgUa tatatctatg 1300 
cagcaggcta gggaggttgt tcagacctgc ccgcattgta attcagcccc tgcgttggag 1860 
gccggggtaa accctagggg tt.tgggaccc ctacagatat ggcagacaga ctttacactt 1920 
gagcctagaa tggccccccg ttcctggctc gctgttactg tggataccgc ctcatcggcg 1980 
atagtcgtaa ctcagcatgg ccgtgtcaca tcggttgctg cacaacatca ttgggccacg 2040 
gctatcgccg ttttgggaag accaaaggcc ataaaaacag ataatgggtc ctgcttcacg 2100 
tctaaatcca cgcgagagtg gctcgcqaga tgggggatag cacacaccac cgggattccg 2160 
ggtaattccc agggtcaagc tatggtagag cgggccaacc ggctcctgaa agataagatc 2220 
cgtgtgcttg cggaggggga tggctttatg aaaagaatcc ccaccagcaa acagggggaa 2280 
ctattagcca aggcaatgta tgccctcaat cactttgagc gtggtgaaaa cacaaaaaca 2340 
ccgaLacaaa aacactggag acctaccgtt cttacagaag gacccccggt taaaatacga 2400 
atagagacag gggagtggga aaaaggatgg aacgtgctgg tctggggacg aggttatgcc 2460 
gct.gtgaaaa acagggacac tgataaggtt atttgggtac accaccacca ccaccactaa 2520 



<210> 19 

< 2 1 1 > 1758 

<2I2> DNA 

<2 1 3:- myeloblastos l r,- associated 



22 3> alpha coding reaion (met, his tag, stop) 
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atqactgtLq cactacatct qgctattccg ctcaaatgqa agccaaacca cacgcctgtg 60 
tgqat tgci cc agtqqcccct tcctgaaggt aaactcgt.ig cgctaacgca attaqtqaaa 120 
aaagaattac agttaggaca Utagaacct tcacttagtt gctggaacac acctgtcttt 180 
gtgatccqga aggcttccgg gtcttatcgc ttattgcatg acttgcgcgc tgtLaacgct 240 
5 aagcttgttc cttttggggc cgtccaacag ggggcgccgg ttctctccgc gctcccgcgt 300 
ggttggcccc tqatggtcct agacctcaag gattgcttct tttctattcc tcttgcggaa 360 
caaqatcgcg aacgttttgc atttacgctc ccctccgtga ataaccaggc ccccgctcga 420 
aggttccaat ggaaggtctt gccccaaggg atgacctgtt ctcccactat ctgtcagttg 480 
atagtgggtc aaatacttga gcccttgcga ctcaagcacc catctctgcg catgttgcat 540 
10 tatatggatg atcttttgct agccgcctca agtcatgatg ggttggaagc ggcaggggag 600 
gaggttatca gLacattgga aagagccggg ttcaccattt cgcctgataa ggtccagagg 660 
gagcccggag tacaatatct tgggtacaag ttaggtagta cgtatgtagc acccgtaggc 720 
ctggtagcag aacccaggat agccaccttg tgggatgttc agaagctggt ggggtcactt 780 
cagtggcttc qcccagcgtt aggaatcccg ccacgactga tgggcccctt ttatgagcag 840 
15 ttacgagggt cagatcctaa cgaggcgagg gaatggaatc tagacatgaa aatggcctgg 900 
agagagatcg tacagctcag caccactgct gccttggagc gatgggaccc tgccctgcct 960 
ctggaaggag cggtcgctag atgtgaacag ggggcaatag gggtcctggg acagggacug 1020 
tccacacacc caaggccatg tttgtggcta ttctccaccc aacccaccaa ggcgtttact 1080 
gcttggttag aagtgctcac ccttttgat: actaagctac gtgcttcggc agtgcgaacc 1140 
20 tttggcaagg aqgttgatr.at cctcctgtLg cctgcatgct ttcgggagga ccttccgctc 1200 
ccggagggga t.ccigttagc ccttaagggg tttgcaggaa aaatcaggag tagtgacacg 1260 
ccatctattt ttgacattqc gcqtccaclg catgtttctc tgaaagtgag ggttaccgac 1320 
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CtJ t . n , r . r ,|-.(i, k:c: ac t gtetLlact gacgcctcct caagcaccca taagggggt.g 1380 

al ., ,, ,., , , ,., yf rt --c ajqgtgggag ataaaagada lagccqattl gqqqgcaagt 14 4'.) 
„ta,-,„,c.M'- t.|(Kia'r:.i,-(] cgctgtggcc atggcacttc tgctgtggcc gacaacgccc 1S00 
actaat.qtaq t qactgactc cgcgtttgtt gcgaaaatgt tacrcaogat gggacaggag 1.S60 
ggciLjLccc-gt ■- : a q<~q gc ggcttttatt ttagaggatg cgttaagcca aaggtcagcc 1620 
at ircqrcq it.-: ..Mt g/ggagtcat tctgaagtgc cagggttttt cacagaagga 1680 
aatg.-icqt.qq r.i-iat ..q.-.-., uqcc.icct 1 1 caagcgtaLc ccttgagaga ggctaaacac 3 740 
"58 



■:2 10> 2 0 
10 <2il> 4 2 
<2 12> DNA 

•■21'i> Artificial Sequence 



25 



Description of Artificial Sequence: FSDRT 



15 <4 00> 20 

tgtactaagg aggtgttcat gactgttgcg ctacatctgg ct 



<210> 21 

< 2 1 1 > 4 4 

<212> DNA 

20 <2ll.- Artificial .Sequence 



'i> Description of Artificial Sequence: RSDBAC2 



< A 21 

rjccr-.gatgta q-gcaacagt catatttata gqttttttta ttac 



■•.2:1.' 2 0 

<2i2..- DNA 



BNSDOCID <WO ._00421 
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- 50 - 



PCT/USOO/00896 




<210;- 23 

<211; 57 
•:212-- DNA 

<213> Artificidl Sequence 
<22 0> 

<223> Description of Artificial Sequence: M1BARSDHIS 
<400> 23 

acccggatca attaattagt ggtgqtggtg gtggtgttta gcctctctca agggata 



<210> 2 4 
15 <211> 54 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: Ml KARSDHI S 
20 <400> 24 

acccggatca attaattagt ggtggtggtg gtggtgccaa ataaccttat cagt 



-111 > 4 4 
•'.2 12> DNA 
25 <213> Artificial Sequence 

<2 2 0> 

<223v Description of Artificial Sequence: FMlBASmal 

<400> 25 

ataagggeca ctgttctccc egggatgact gttgcgctgc atct 
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Artificial Sequence 



tion of Artificial Sequence: primcr-MlBA 



10 ■ 4-m( • 21 

ttl.q-JCtct ctcuuqcjgat 



• : ; i 21 

15 -::ii> Artificial Sequence 



tion of Artificial Sequence: pnmer-MlKA 



cct C atcaq 



20 oio> 2 9 



DMA 

Artificial Sequence 



Ix.-:; cription 
sequencinq orimer or FSP 



ujqqtt ttcccaqtca cga 



f Artificial Sequence: 



-.2 I 0 > 



WO 00/42199 



- 52 - 
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< 2 i 3 > Artificial Sequence 

capture primer 

5 <4()0 - 3 0 

aa.ictatgcc aactagagat tggaggLtgt tt 



<-:?.] 0 31 

<:2 11 ■ 4 0 

<2 12> DMA 

10 <213> Artificial Sequence 

<220> 

<223-> Description of Artificial Sequence: arnplifi 
primer 1 

<400> 31 

15 accccatcca atgcatgtct cgggtcgtag tcttaaccat 



<210> 32 

<211> 4 0 

<212> DNA 

<213> Artificial Sequence 
20 <220> 

<223> Description of Artificial Sequence: amplification 
primer 2 

<-4 0 0> 3 2 

cgattccgct ccagacttct cgggtgctga aggagtaagg 

25 <2J0> 23 

< 2 1 1 > 2 3 

•:212> DMA 

<213> Fllhis 

<4 00> 3 3 
30 ggccacacca ccaccaccac cac 
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5 < 4 0 0 > 3 4 

qgccgtqgtg gtggtggtgg tgt 



<210> 35 

< 2 1 1 > 6-1 

<212> DMA 

•:213> RM1 KAhi s Kpnl 



<210> 36 

<211> 66 

<212> DNA 

<213> RMIAAhisAccI 



<4 00> 3 6 

aaaataaaag ccgccgctgt cgacttagtg gtggtggtgg tggtggactc cctcctgtcc 60 



< 2 1 0 > 3 7 
<21i> 4 0 
<212> DNA 

•213> Artificial Sequence 
25 <22 0> 

<223> Description of Artificial Sequence: HRP- con j uga t ed 
P2 comp 

<4 0 0> 37 

ccttactccl tcagcacccg agaaqtctgg aqcqqaatcq 
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tgg aca gca agg aca cat cat gtc aaa atg ccc aga 
Trp Thr Ala Arg Thr His His Val Lys Met Pro Arg 



10 



,ca q,c ggg ttt ttt agg gtt egg ccc ctg ggg aaa gaa gec teg 96 
l.y^ Thr Giy Gly Phe Phe Arg Val Arg Pro Leu Gly Lys Glu Ala Ser 
20 



25 30 



caa ttt. ccc cgt cca ggc acc cca ggg gat agt gec ate tgc gee cce 144 
G i„ Phe Pro Arq Pro Gly Thr Pro Gly Asp Ser Ala lie Cys Ala Pro 

45 



40 



egg cat gac acc tea ggg tgc gat tec ate tgc 



A lp Glu Pro .Ser lie Arg His Asp Thr Ser Gly Cys Asp Ser He Cys 



tit gaa ccc agc 

He Arg His Asp Thr Ser Gly 

55 60 

240 



acc- ccc tgc cga tec age aga gga gat get aaa gaa eta cat gca act 
Thr Pro Cys Arg Ser Ser Arg Gly Asp Ala Lys Glu Leu His Ala Thr 

e5 

. iqq gaa gaa gca gaa gga gaa eag aga gag ace eta caa gga ggt gac 288 
"»S ^g Glu Glu Ala Glu Gly Glu Gin Arg Glu Thr Leu Gin Gly Gly Asp 

90 

aga gga ttt get gca cct caa tte tct ctt tgg aga aga cca gta gtc 336 
Arg Gly Ph<- Ala Ala Pro Gin Phe Ser Leu Trp Arg Arg Pro Val Val 

30 aaa qc. act att gag ggt caa tea gta gaa gta tta eta gac aca gga 384 
l, y ... Ala Thr He Glu Gly Gin Ser Val Glu Val Leu Leu Asp Thr Gly 
115 120 L?'> 

get cat aac tc, ata gta gca ggg ata gaa tta ggc age aat tae acc 432 
Ala Asp As p Ser lie Val Ala Gly He Glu Leu Gly Ser Asn Tyr Thr 
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™- r,-,-, tft --.ta aat acc aat qaa ta: 
aaa ata qta qqt qgq ata gg« 9'3 a L l - " cc4 
,c Lyl lie V,, Gly Gly He Gly Gly l'h« He A,n Thr Asn Glu 7yr 



145 



150 



aaa aat gta gaa ata qaa gta gta gga aaa aga gta aga gc 
165 "0 



Met Thr Gly Asp Thr Pro He Asn lie Phe Gly Arg Asn lie Leu Asn 

l e o 185 

agc tta ggc atg act eta aat ttc cca gta gca agg ata gaa eca gta 



19E • ) " M 205 



200 



aaa gtc cag tta aag cct gaa aaa gat ggg cca aaa ate aga caa tgg 67. 
210 

ccc eta tec aaa gag aaa ata eta gee etc aaa gaa ate tgt gaa aaa 720 

Pro Leu Ser L ys Glu i.y, He Leu Ala Leu Lys Glu He Cys Glu Lys 

225 

atg gaa aaa gag gga cag tta gaa gag gcg cct cct act aat cca tae 768 

245 250 255 

aat teg ccc acc ttc gec ata aaa aag aaa gac aaa aae aaa tgg agg 810 

" ' 260 265 270 

atg eta ata gat ttc aga gaa eta aac aag gta acc caa gaa ttt aca 864 

21b 280 285 

aag qtc cag ctg ggt att cct cac eea gca gga etg gca tea aag aaa 912 
du Val Gin Leu Gly He Pro His Pro Ala Gly Leu Ala Ser Lys Lys 

300 

290 '-- J 

.» «. ,» «. „« *. „. - - •<* «« - a : ta **° 

Arc, lie Thr Val Leu Asp Val Gly Asp Ala Tyr Phe Ser Val » 
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,., - mi. ,. r, l a Ph.-- Th >• i.^u pro Ala Vsi Asn 

Asp tio Asp Pl.o Arg Gin i y r il.i Mia tm. Hu fc«ru 

32 5 330 3 3 5 

„ rtt t=jc aaa ate eta cca cag gga 

aat gca gaa cca gga aag aga tat ctt L .ac aaa yuu 

A-n Ala Glu Pro Gly Lys Arg Tyr Leu Tyr Lys Val Leu Pro Gin Gly 

340 345 350 

tgg aag gga tec cca gca att ttc cag tac acc atg gca aag gta eta 

Trp Lys Gly Ser Pro Ala He Phe Gin Tyr Thr Met Ala Lys Val Leu 

355 360 365 



gac cct ttc aga aaa gec aac aat gat gtc act ata ate cag tae atg 
Asp Pro Phe Arg Lys Ala Asn Asn Asp Val Thr He lie Gin Tyr Met 
370 375 380 



gat gac att etc gtg gca agt gac agg age gat ctg gag cat gac agg 

Asp Asp lie Leu Val Ala Ser Asp Arg Ser Asp Leu Glu Has Asp Arg 

™„ ^qS 400 

385 



390 



395 



gta gtg tct 



eta aaa gag eta tta aat aac atg gga ttc tct act 
Val Val Ser Gin Leu Lys Glu Leu Leu Asn Asn Met Gly Phe Ser Thr 
405 410 415 



eea gaa gaa aag ttc caa aaa gac cct cea ttc aaa tgg atg ggg tat 
Pro Glu Glu Lys Phe Gin Lys Asp Pro Pro Phe Lys Trp Met Gly Tyr 
420 425 430 

gag etc tgg cca aag aaa tgg aaa ctg caa aaa ata cag eta cca gaa 
Glu Leu Trp Pro Lys Lys Trp Lys Leu Gin Lys lie Gin Leu Pro Glu 
435 440 445 



aaa gag qtt tgg aca gta aat gac 
Lys Glu Val Trp Thr Val 

4 5 ■-■ 4 5 5 



:ag aag tta gtg gga gta tta 
Asn Asp lie Gin Lys Leu Val Gly Val Leu 
4 60 



g:a get caa ctt 
Ala Ala Gin Leu Phe P 
470 



;cg ggg att aag acc agg cat ata tgt 
y lie Lys Thr Arg His lie Cys 
475 480 



aaa cca ata agg gga aag atg acc eta aca gaa gag gta caa tgg act 
Lys Leu lie Arg Gly Lys Met Thr Leu Thr Glu Glu Val Gin 



i Trp Thr 
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Glu Leu Al 



•etc cag gaa 
Phe Gin GJ u 
505 



gag cag gaa gqa tec tat tac aaa qaa ggq gta cct tta gaa gca 

Tyr Tyr Lys Glu Gly Va 1 Pro Leu Glu Ala Th 
520 525 



ca 1584 



Glu Gin Glu Gly 



gtg cag aaa aat eta gca aat cag tgg aca tac aag att cat cag qga 
Val Gin Lys Asn Leu Ala Asn Gin Trp Thr Tyr Lys He lias Gin Gly 
530 535 540 



gat aaa ate eta aaa gta gga aaa tat gca aag gtt aaa aa. 
Asp Lys He Leu Lys Val Gly Lys Tyr Ala Lys Val Lys Asn Thr Hi 
545 550 555 



cac 1680 



560 



acc aat gga gta aga eta ttg get cat gta gtc caa aaa ata gga aag 
Thr Asn Gly Val Arg Lou Leu Ala His Val Val Gin Lys lie Gly Lys 



gaa gca ttg gtc ate tgg gga gag 



ata cca atg ttc cat eta cca gta 
Glu Ala Leu Val He Trp Gly Glu He Pro Met Phe His Leu Pro Val 
580 585 590 



gaa aga gag aca tgg gat cag tgg tgg aca gat tac tgg caa gta acc 
Glu Arg Glu Thr Trp Asp Gin Trp Trp Thr Asp Tyr Trp Gin Val Thr 
595 600 605 

tgg ate cca gaa tgg gat ttt gtc tea acc cca cca tta ata agg tta 
Trp He Pro Glu Trp Asp Phe Val Ser Thr Pro Pro Leu He Arg Leu 
610 615 620 

gec tat aac ctg gtc aaa gac ccc eta gaa gga gta gaa act tac tac 
Ala Tyr Asn Leu Val Lys Asp Pro Leu Glu G.1 y Val Glu Thr Tyr Tyr 
625 630 635 640 

aca gat gga tec tgt aac aaa gee tea aaa gaa ggg aaa gca gga tat 
Thr Asp Cly Ser Cys Asn Lys Ala Ser Lys Glu Gly Lys Ala Gly Tyr 



gtc aca gac agg qga aag gat aaa gtt aaa cca tta gaa c a 

Val Thr Asp Arg Gly Lys Asp Lys Val Lys Pro Leu Glu Gin Thr Thr 
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qca qaq 



Leu 
685 



gga cca cag gtc aat ate ata gta gat tea caa tat gtc atg gga ata 
Giy Pro Gin Val Asn lie Tie Val Asp Ser Gin Tyr Val Met Gly He 
695 700 



69 0 



qta 



gaa aca gaa tea ccg ata gta aga gaa 



Val Ala Ala Gin Pro Thr Glu Thi 
705 710 



Glu Ser Pro He Val Arg Glu He 
715 720 



att gaa gaa atg ate aaa aag gaa aaa ata tat gta gga tgg gta cca 2208 
He Glu Glu Met Ho Lys Lys Glu Lys He Tyr Val Gly Trp Val Pro 
72 5 730 735 

get cae aag gga ctg ggt ggt aat cag gaa gta gac cac eta gtg age 2256 
Ala His Lys Gly Leu Gly Gly Asn Gin Glu Val Asp His Leu Val Ser 
740 745 750 

caa gga att aga caa ate eta ttt eta gaa aaa ata gaa cca get caa 2304 
Gin Gly He Arg Gin He Leu Phe Leu Glu Lys He Glu Pro Ala Gin 
755 7 60 7 65 

gaa gaa cat gaa aaa tat cat aat aat gta aaa gaa eta gtc cat aaa 2352 
Glu Glu His Glu Lys Tyr His Asn Asn Val Lys Glu Leu Val His Lys 
770 775 780 



ttt ggg att cca 



785 



:a gtg gca aga caa ata gta aat tec tgt gat 
Gly He Pro Gin Leu Val Ala Arg Gin lie Val Asn Ser Cys Asp 
J0 795 800 



aaa tgc caa caa aaa ggg gaa get att cat gga cag gta aat tea gaa 
Lys Cys Gin Gin Lys Gly Glu Ala He His Gly Gin Val Asn Ser Glu 



810 



ggg aca tgg caa atg gac tgt aca cat tta gag gga aag gtt 
Gly Thr Trp Gin Met Asp Cys Thr His Leu Glu Gly Lys Val 



820 



830 

2 54 4 



ata gtg gca gtt cat gta gee agt gga ttc ata gaa gca gaa gta ata 
lie Val Ala Val His Val Ala Ser GJ y Phe He Glu Ala Glu Val He 
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Pro Gin Glu Tnr Gly Arg Gin Ti;: Ai a Leu E'hc Leu Leu Lys Leu 
850 855 860 



age aga 



:ct ate acj cac ctg cac aca qac aac ggt gec aac tec 
Ser Arg Trp Pro lie Thr His Leu His Thr Asp Asn Gly Ala Asn Phe 
865 870 875 880 

act tea caa gat gtg aaa atg gca gec tgg tgg ata ggg ata gaa caa 
Thr Ser Gin Asp Va 1 Lys Met Ala Ala Trp Trp lie Gly lie Glu Gin 
885 890 895 

aca ttc gga gtg ccc tat aat cca gaa agt cag gga gta gta gaa gca 
Thr Phe Gly Val Pro Tyr Asn Pro Glu Ser Gin Gly Val Va 1 Glu Ala 
900 905 910 

atg aac cat cat ctg aaa aat cag ata gac aga att aga gat cag gca 
Met Asn His His Leu Lys Asn Gin lie Asp Arg He Arg Asp Gin Ala 
915 920 925 

gta tea ata gag aca gtt gtg tta atg gca act cac tgc atg aat ttt 
Val Ser He Glu Thr Val Val Leu Met Ala Thr His Cys Met Asn Phe 
930 935 940 

aaa aga agg gga gga ata ggg gat atg acc cct gca gaa aga ata gtc 
Lys Arg Arg Gly Gly He Gly Asp Met Thr Pro Ala Glu Arg lie Val 
945 950 955 960 

aac atg ata act aca gaa caa gaa ata caa ttc etc caa aca aaa aat 
Asn Met lie Thr Thr Glu Gin Glu lie Gin Phe Leu Gin Thr Lys Asn 
965 970 975 

tta aaa ttc caa aat ttc egg gtc tat tac aga gaa ggc aga gat caa 
Leu Lys Phe Gin Asn Phe Arg Val Tyr Tyr Arg Glu Gly Arg Asp Gin 



etc tgg aag gga cct ggt gat eta ttg tgg aaa ggg gaa gga gca gtc 
Leu Trp Lys Gly Pro Gly A:;p Leu Leu Trp Lys Gly Glu Gly Ala Val 
995 1000 1005 



ta aag gta ggg aca gaa ate aaa gta ata ccc aqa aga 



gca 3072 



He He Lys Val Gly Thr Glu He Lys Val He Pro Arg Arg Lys Ala 
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ago aac tat qqa qqa qqa aaa qaa t.tg gat 
Arg Asn Tyr Gi y GJ y Gi y Lys Glu Leu Asp 
1030 1035 



aat taa 3168 



5 gac qtg qag gat acc atg cag get aga gag gtg gca cag 

Asp Val Glu Asp Thr Met Gin Ala Arg Glu Val Ala Gin Ser Asn 
10 45 1050 1055 



<2 10> 39 
<211> 1055 
10 <212> PRT 

<213> Human immunodeficiency virus type 2 

<400> 39 

Met Leu Glu Met Trp Thr Ala Arg Thr His His Val Lys Met Pro Arg 
1 5 10 16 

15 Lys Thr Gly Gly Phe Phe Arg Val Arg Pro Leu Gly Lys Glu Ala Ser 



Gin Phe Pro Arg Pro Gly Thr Pro Gly Asp Ser Ala He Cys Ala Pro 
35 40 45 

Asp Glu Pro Ser lie Arg His Asp Thr Ser Gly Cys Asp Ser lie Cys 
20 50 55 60 

Thr Pro Cys Arg Ser Ser Arg Gly Asp Ala Lys Glu Leu His Ala Thr 
65 70 75 80 

Arg Glu Glu Ala Glu Gly Glu Gin Arg Glu Thr Leu Gin Gly Gly Asp 

25 Arg Gly Phe Ala Ala Pro Gin Phe Ser Leu Trp Arg Arg Pro Val Val 
100 1 05 HO 

Lys Ala Thr He Glu Gly Gin Ser Val Glu Val Leu Leu Asp Thr Gly 
115 120 125 

Ala Ar. p Asp Ser lie Val Ala Gly 11. Glu Leu Gly Ser Asn Tyr Thr 
30 130 135 140 
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F . ro L .,., 1Jt . -.-.si Civ Gly He Gly Gly Phe lie Asn Thr Asn Glu Tyr 



Ly , A:;tl v.. I IK- Glu V,; Val Gly Lys Arg Val Arg Ala Thr Val 

j 17 0 1 7 r > 

5 Met Thr Gly Asp Thr Pro lie Asn He Phe Gly Arg Asn He Leu Asn 
180 185 190 

Ser Leu Gly M.-t Thr Leu Asn Phe Pro Val Ala Arg lie Glu Pro Val 
19- 200 205 

Lys Val Gin !.-/.-. Ho Glu Lys Asp Gly Pro Lys He Arg Gin Trp 

10 210 215 220 

Pro Leu Ser Ly:. Clu Lys He Leu Ala Leu Lys Glu lie Cys Glu Lys 

-wn ->3S 240 

225 230 iJ5 

Met Glu Lys Glu Gly Gin Leu Glu Glu Ala Pro Pro Thr Asn Pro Tyr 

2Ab 250 255 

15 Asn Ser Pro Thr Phe Ala He Lys Lys Lys Asp Lys Asn Lys Trp Arg 
260 265 270 

Met Leu He Asp Phe Arg Glu Leu Asn Lys Val Thr Gin Glu Phe Thr 
275 280 285 

Glu Val Gin Leu Gly He Pro His Pro Ala Gly Leu Ala Ser Lys Lys 
20 290 295 300 



Arg He Thr Val l.c 



Asp Val Gly Asp Ala Tyr Phe Ser Val Pro Leu 
310 315 



320 



A- p Ho A-p PH Hu G-r. Ty: Thr Ala Phe Thr Leu Pro Ala Val Asn 

" P ^ 32 5 ' 330 335 

25 Asn Ala Glu Pro Gly Lys Arg Tyr Leu Tyr Lys Val Lou Pro Gin Gly 

345 350 



Trp Lys Gly Ser Pro Ala He Phe Gin Tyr Thr Met Ala Lys Val Lou 
355 360 365 

Asp Pro Phe Arg Lys Ala Asn Asn Asp Val Thr He He Gin Tyr Met 
30 370 375 380 



PCTAJSOO/00896 

WO 00/42199 _ 62 . 

Asv Asp lie Leu Val Ala 5er Asp Arg Ser Asp Leu Glu Hxs Asp Arg 



v . , Val : -. eL Gin lifcU Lys Glu Leu Leu Asn Asn Mot Gay i'ne ber 

410 415 



Tin 



5 Pro Glu Glu Lys Phe Gin Lys Asp Pro Pro Phe Lys Trp Met Gl y Tyr 
420 425 430 

Glu Leu Trp Pro Lys l.ys Trp Lys Leu Gin Lys He Gin Leu Pro Glu 

435 440 445 

Lys Glu Val Trp Thr Val Asn Asp He Gin Lys Leu Val Gly Val Leu 
10 450 455 460 

Asn Trp Ala Ala Gin Leu Phe Pro Gly He Lys Thr Arg His lie Cys 



465 470 



475 



Lys Leu lie Arg Gly Lys Met Thr Leu Thr Glu Glu Val Gin Trp Thr 

485 490 495 

15 Glu Leu Ala Glu Ala Glu Phe Gin Glu Asn Lys lie He Leu Glu Gin 

500 505 510 

Glu Gin Glu Gly Ser Tyr Tyr Lys Glu Gly Val Pro Leu Glu Ala Thr 

515 520 "5 

n Leu Ala Asn Gin Trp Thr Tyr Lys He His Gin Gly 



Val Gin Lys As 
20 530 535 



540 



Asp Lys lie Leu Lys Val Gly Lys Tyr Ala Lys Val l.ys Asn Thr His 
545 550 555 560 



Thr Asn Gly Val Arg Leu 



565 570 :>'-> 

25 Glu Ala Leu Val He Trp Gly Glu He Pro Met Phe His Lou Pro Val 
580 585 590 

Glu Arg Glu Thr Trp Asp Gin Trp Trp Thr Asp Tyr Trp Gin Val Thr 
5ql) 600 60 5 

Trp lie Pro Glu Trp Asp Phe Val Ser Thr Pro Pro Leu lie, Arg Leu 
30 610 £15 6^0 



PCT/US00/00896 
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Ala Tyr Asn Leu Val I.ys Asp Pro Leu GIu Gly Val Glu Thr Tyr Ty 



64 0 



.sn Lys Ala liar Lys Glu Gly l.ys A, a Giy 
650 655 



5 Val Thr Asp Arg Gly Lys Asp Lys Val Lys Pro Leu Glu Gin Thr Thr 
660 665 670 

Asn Gin Gin Ala Glu Leu Glu Ala Phe Ala Leu Ala Leu Gin Asp Ser 



675 



680 



685 



Gly P^o Gin Val Asn He He Val Asp Ser Gin Tyr Val Met Gly He 
10 690 695 700 

Val Ala Ala Gin Pro Thr Glu Thr Glu Ser Pro He Val Arg Glu He 



Tin ^ 720 

705 710 



715 



He Glu Glu Met He Lys Lys Glu Lys He Tyr Val Gly Trp Val Pro 
725 



15 Ala His Lys Gly Leu Gly Gly Asn Gin Glu Val Asp His Leu Val Ser 
Gin Gly He Arg Gin He Leu Phe Leu Glu Lys He Glu Pro Ala Gin 



755 



760 



765 



Glu Glu His Glu Lys Tyr His Asn Asn Val Lys Glu Leu Val His Lys 

20 770 7 75 780 

Phe dy He Pro Gin Leu Val Ala Arg Gin He Val Asn Ser Cys Asp 

™~ iQS 800 

785 790 795 

Gin Gin Lyi 31y Glu Ala He H: s Gly Gin Val Asn Ser Glu 
805 810 815 



25 Leu Gly Thr Trp Gin Met Asp Cys Thr His Leu Glu Gly Lys Val lie 
820 825 830 

He V.l Ala Val His Val Ala Ser Gly Phe He Glu Ala Glu Val He 



835 



84 0 



81 5 



Pro Gin Glu Thr Gly Arg Gin Thr Ala Leu Phe Leu Leu Lys Leu Ala 



30 8 50 



860 
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Tt ,, c-o He Tr.r Mi s Leu His Thr Asp As n Gly Ala Asn Phe 

\ HI'- 8 8 0 



Met Ala Ala Trp Tip lie Gly lie Glu Gin 
ft Q 0 895 



5 Thr Ph.- Gly Val Pro Tyr Asn Pro Glu Ser Gin Gly Val Val Glu Ala 
c (00 905 910 

Met Asn Hi:: H : : L-u Lys Asn Gin He Asp Arg He Arg Asp Gin Ala 

9V) 920 925 

Val Ser He Glu Ti.r Val Val Leu Met Ala Thr His Cys Met Asn Phe 



Lys Arg Arg Gly Gly lie Gly Asp Met Thr Pro Ala Glu Arg lie Val 
94b 950 955 960 

Asn Met He Thr Thr Glu Gin Glu He Gin Phe Leu Gin Thr Lys Asn 
965 970 9"^ 

15 Leu Lys Phe Gin Asn Phe Arg Val Tyr Tyr Arg Glu Gly Arg Asp Gin 

980 985 990 

Leu Trp l.ys Gly Pro Gly Asp Leu Leu Trp Lys Gly Glu Gly Ala Val 
995 1000 1° 05 

lie lie Lys Val Cly Thr Glu lie Lys Val He Pro Arg Arg Lys Ala 
20 1010 1015 1020 

Lys lie He Arg Asn Tyr Gly Gly Gly Lys Glu Leu Asp Cys Ser Ala 
025 1030 1035 1040 

Asp Vai Giu Asp 7 hi Met Gin AH Arg Glu Val Ala Gin Ser Asn 
1045 1050 10S5 



25 <2 1U> 4 0 

<2H> 364 3 

<2 12> DMA 

<213-- Murine leukemia virus 



30 



<220> 
<221> CDS 
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Cricj gag ccc 
■ Gin Glu Pro 



ccc agg 



Pro Pro Glu Pro Arg He Thr Leu Lys 



10 



15 



q t- nnn c r .q caa ccc gtc acc tie ctg gta gat act ggg gec caa cue 96 
■■ i r-, Y Gly ''In Pro Val Thr Phe Leu Va 1 Asp Thr Gly Ala Gin His 

10 ^ „tg ctg ., caa aat cct gga ccc eta agt gat aag tct gec: tgg 144 
v.l U-u Thr Gin Asn Pro Gly Pro Leu Ser Asp Lys Ser Ala Trp 
35 4° 45 

c,tr caa qqg get act gga gga aag egg tat cgc tgg acc acg gat cgc 192 

Val Gin Gly At a Thr Gly Gly Lys Arg Tyr Arg Trp Thr Thr Asp Arg 

15 

.,„., eta cat eta get acc ggt aag gtc acc cac tct ttc etc cat gta 240 
j , .. va! Mrs Leu Ala Thr Gly Lys Val Thr His Ser Phe Leu His Val 

7S 80 

(,5 70 

cac tgt ccc tat cct ctg tta gga aga gat ttg ctg act aaa eta 288 
20 Pro A:, P Cys Pro Tyr Pro Leu Leu Gly Arg Asp Leu Leu Thr Lys Leu 
65 90 95 

l(1 y „. ,-,a ,tc cac ttt gag gga tea gga get cag gtt atg gga cca 336 
1,,, Ala Gin lie His Phe Glu Gly Ser Gly Ala Gin Val Met Gly Pro 



100 



105 H° 



, nf , cag --c eta caa gtg ttg acc eta aat ata gaa gat 

., - T . t-v r - fe , £ S - He Glu Asp Glu Hi 



1 1 



120 



cat 384 

Asp Glu 

125 



egg eta cat gag acc tea aaa gag cca gat gtt tet eta ggg tec aea 432 
Arg U-u His Glu Thr Ser Lys Glu Pro Asp Val Ser Leu Gly Ser Thr 
130 13b 140 

tq ,, clq trL qdt ttt cct cag gec tgg gcg gaa acc ggg ggc atg gga 480 
Tro L-U Ser Asp Phe Pro Gin Ala Trp Ala Glu Thr Gly Gly Met Gly 

-1 C L 160 

145 iSO 1" 
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tec ata aaa caa tac ccc atg tea caa gaa gec aga ctg 
Ser lie Lys Gin Tyr Pro Met Ser Gin Glu Ala Arg Leu 
180 



185 19° 



ggg ate aag ccc cac ata cag aga ctg ttg gac cag gga ata ctg gta 
Gly He Lys Pro His He Gin Arg Leu Leu Asp Gin Gly He Leu Val 



195 200 205 



tec ccc tgg aac acg ccc ctg 



: gtt aag aaa cca 



Pro Cys Gin Ser Pro Trp Asn Thr Pro Leu Leu Pro Val Lys Lys Pro 



210 215 



220 



ggg act aat gat tat agg cct gtc cag gat ctg aga gaa gtc aac aag 
Gly Thr Asn Asp Tyr Arg Pro Val Gin Asp Leu Arg Glu Val Asn Lys 

225 

egg gtg gaa gac ate cac ccc acc gtg ccc aac cct tac aac etc ttg 
Arg Val Glu Asp lie Hxs Pro Thr Val Pro Asn Pro Tyr Asn Leu Leu 
245 



ggg etc cca ccg tec c 



tgg tac act gtg ctt gat tta aag 



20 Ser Gly Leu Pro Pro Ser Hxs Gin Trp Tyr Thr Val Leu Asp Leu Lys 
260 

gat gec ttt ttc tgc ctg aga etc cac ccc acc agt cag cct etc ttc 864 
Asp Ala Phe Phe Cys Leu Arg Leu Hxs Pro Thr Ser Gin Pro Leu Phe 
275 280 285 

25 gec ttt gag tgg aga gat cca gag atg gga ate tea gga caa ttg ace 912 
Ala Phe Glu Trp Arg Asp fro Glu Met Cxj ^fc -C- 
290 295 300 

tgg ace aga etc cca cag ggt ttc aaa aac agt cec acc ctg ttt gat 960 
Trp Thr Arg Leu Pro Gin Gly Phe Lys Asn Ser Pro Thr Leu Phe Asp 

,,r 320 

30 305 310 31, 

gag gea ctg eae aga gac eta gea gac ttc egg ate cag eac cca gac 100£ 
Glu Ala Leu His Arg Asp Leu Ala Asp Phe Arg He Gin Hxs Pro Asp 

"< -3 r , 
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era ciq 
Leu Gin 

34 0 



Ala Ala 
350 



gag -t. ,ac tgc caa caa ggt act egg gec ctg tta caa ace eta ggg 1104 
Giu Leu Asp Cys Gin Gin Gly Thr Arg Ala Leu Leu Gin Thr Leu Gly 
360 



aac etc ggg tat egg gee teg gc 
Asn Leu Gly Tyr Arg Ala Ser Al 
370 375 



: aag aaa gc 
i Lys Lys Al 



: caa att tgc cag aaa 
Gin He Cys Gin Lys 
300 



lib 



cag gtc aag tat ctg ggg tat ett eta aaa gag ggt cag aga tgg ctg 
Gin Val Lys Tyr Leu Gly Tyr Leu Leu Lys Glu Gly Gin Arg Trp Leu 

385 



atg ggg cag cct 



act ccg aag acc 



gec aga aaa gag act gtg 

Ala Arg Lys Glu Thr Val Met Gly Gin Pro Thr Pro Ly: 
405 410 



415 



cct cga caa eta agg gag ttc eta ggg acg gca gge ttc tgt cge etc 
Pro Arg Gin Leu Arg Glu Phe Leu Gly Thr Ala Gly Phe Cys Arg Leu 
420 



20 



tgg ate cct ggg ttt gca ga. 
Trp He Pro Gly Phe Ala Gli 
435 



atg gca gec ccc ttg tac cct etc acc 
Met Ala Ala Pro Leu Tyr Pro Leu Thr 
440 445 



acg ggg act ctg ttt aat tgg ggc cea gac caa caa aag gee tat 
Thr Gly Thr Leu Phe Asn Trp Gly Pro Asp Gin Gin Lys Ala Tyr 
450 455 460 



25 



Gi n ' 
465 



: aag caa get ctt 



470 



, act gee cea gec ctg ggg ttg cca 
Thr Ala Pro Ala Leu Gly Leu Pro 
475 480 



30 



gat ttg act aag ccc ttt gaa etc ttt gtc gac gag aag cag ggc tac 

Asp L.u Thr Lys Pro Phe Glu Leu Phe Val Asp Glu Lys Gin Gly Tyr 

a q r\ 495 

485 490 

gec aaa ggt gtc eta acg caa aaa ctg gga cct tgg cgt egg ccg gtg 

Ala Lys Gly Val Leu Thr Gin Lys Leu Gly Pro Trp Arg Arg Pro Val 

500 ^ 510 
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tgg ccc 
Trp Pro 



toe eta egg atg gta qca gec att gec gta 
Cys Leu Arg Mot Val Ala Ala He Ala Val 
530 535 

aag eta ace atg gga cag cca eta gtc att 
Lys Leu Thr Met Gly Gin Pro Leu Val He 
545 550 

gag gca eta gtc aaa caa ccc ccc gac cgc 
Glu Ala Leu Val Lys Gin Pro Pro Asp Arg 
565 "0 



ctg aca aag gat gca gqc 

Leu Thr Lys Asp Ala Gly 
540 

ctg gec ccc cat gca gta 

Leu Ala Pro His Ala Val 

555 560 

tgg ctt tec aac gec egg 
Trp Leu Ser Asn Ala Arg 
575 



atg act cac tat cag gec ttg ctt ttg gac 
Mot Thr His Tyr Gin Ala Leu Leu Leu Asp 
580 585 



acg gac egg gtc cag ttc 
Thr Asp Arg Val Gin Phe 
590 



gga ccg gtg gta gee ctg aac ccg get acg 
Gly Pro Val Val Ala Leu Asn Pro Ala Thr 
595 600 



ctg etc cca ctg cct gag 
Leu Leu Pro Leu Pro Glu 
605 



20 



25 



gaa ggg ctg caa cac aac tgc ctt gat ate c 
Glu Gly Leu Gin His Asn Cys Leu Asp He I 
610 615 

acc cga ccc gac eta acg gac cag ccg etc < 

Thr Arg Pro Asp Leu Thr Asp Gin Pro Leu : 
625 630 

tgg tac acg gat gga age aqt etc tta caa 

Trp Tyr Thr Asp Ciy Ser -ex Leu Leu G:r. 

645 6 50 



gec gaa gec cac gga 
, Ala Glu Ala His Gly 
620 

i gac gec gac cac acc 
j Asp Ala Asp His Thr 
b 640 

g gga cag cgt aag gcg 
j ciy Gir. Arg Lys Ala 
655 



30 



gga get gcg gtg acc acc gag acc gag gta 
Gly Ala Ala Val Thr Thr Glu Thr Glu Val 
660 



tgg get aaa gec ctg 
Trp Ala Lys Ala Leu 
670 



ca qcc ggg aca tec get cag egg get gaa 
'ro Ala Gly Thr -Ser Ala Gin Arc Ala Glu 

675 680 



ata qca etc acc cag 
i lie Ala Leu Thr Gin 
685 
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atg qca gaa qut aay aay * — 

Leu Asn Va] Tyr Thr Asp Se 



: Ala G.lu Gly Ly. 



700 



-,t get ttt qct act gec cat ate cat qga qaa ata tac aga agg 
5 A,q Tyr Ala Phe Ala Thr Ala His He Has Gly Glu He Tyr Arg Arg 



705 "7 10 



715 720 



,gt agg ttg etc aca tea gaa ggc aaa gag ate aaa aat aaa gac gag 
Arq Gly Leu Leu Thr Ser Glu Gly Lys Glu He Lys Asn Lys Asp Glu 
725 730 735 

ate ttg gee eta eta aaa gee cte ttt etg eee aaa aga ett age ata 
lie Leu Ala Leu Leu Lys Ala Leu Phe Leu Pro Lys Arg Leu Ser lie 
7 4 0 



74 5 7 50 



at. eat tgt eea gga eat eaa aag gga cac age gee gag get aga gge 2304 
He His Cys Pro Gly His Gin Lys Gly His Ser Ala Glu Ala Arg Gly 
760 7" 



15 ^5 

aac egg atg get gae eaa geg gee ega aag gea gee ate aea gag act 
Asn Arg Met Ala Asp Gin Ala Ala Arg Lys Ala Ala lie Thr Glu Thr 
770 775 780 

20 Pro Asp Thr Ser Thr Leu Leu lie Glu Asn Ser Ser Pro Tyr Thr Ser 
785 790 795 800 

gaa ttt cat tac aca gtg act gat ata aag gae eta aec aag ttg 

Glu Mis Phe His Tyr Thr Val Thr Asp He Lys Asp Leu Thr Lys Leu 
805 



2352 



810 815 



25 gqg gee att tat gat aaa aca aag aag tat tgg gte tac caa gga aaa 
- l vs . v ._ Tvr Trp VaJ Tyr Gin Gly Lys 



820 



825 830 



ttt gaa tta tta gac ttt ctt cat 2544 



cct gt-j atg cct gac cag ttt 
Pro Met Pro Asp Gin Phe Thr Phe Glu Leu Leu Asp Phe Leu His 

30 835 B40 B45 

eag -a a^ eae cte age ttc tea aaa atg aag get etc eta gag aga 
Gin Leu Thr His Leu Ser Phe Ser Lys Met Lys Ala Leu Leu Glu Arg 
d^, 8 60 



2592 
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.-. . r „ r--, L F ,. c T yr Met Leu Asn Aiq Asp Arg Thr Leu Lys Asn 

Zl ' 11S ' " B70 «8U 

at.c act gag acc tqc aaa get tgt gca caa gtc aac gec age aag tct 2688 

lie Thr Glu Thr Cys Lys Ala Cys Ala Gin Val Asn AJ a Ser Lys Ser 
885 895 

gec gtt aaa cag gga act agg gtc cgc ggg cat egg ccc ggc act cat 2736 
Ala Val Lys Gin Gly Thr Arg Val Arg Gly Has Arg Fro Gly Thr His 
905 910 



900 

tgg gag ate gat ttc acc gag ata aag ccc gga ttg tat ggc tat 

Tyr 

920 925 



2784 



tgg gag ate gat ttc acc gay cue. coy 

Trp Glu lie Asp Phe Thr Glu He Lys Pro Gly Leu Tyr Gly Tyr Lys 



915 



tat ctt eta gtt ttt ata gat acc ttt tct ggc tgg ata gaa gec ttc 
Tyr Leu Leu Val Phe He Asp Thr Phe Ser Gly Trp He Glu Ala Phe 



935 



94 0 



cca acc aag aaa gaa acc gec aag gtc gta acc aag aag eta eta gag 
Pro Thr Lys Lys Glu Thr Ala Lys Val Val Thr Lys Lys Leu Leu Glu 
945 950 955 960 

gag ate ttc ccc agg ttc ggc atg cct cag gta ttg gga act gac aat 



gag ate ttc ccc agg ttc ggc atg cct cag gta uuy yya ^ 

Glu lie Phe Pro Arg Phe Gly Met Pro Gin Val Leu Gly Thr Asp Asn 
965 970 975 

ggg cct gec tte gtc tec aag gtg agt eag aea gtg gee gat ctg ttg 2976 
Gly Pro Ala Phe Val Ser Lys Val Ser Gin Thr Val Ala Asp Leu Leu 
980 985 990 

ggg att gat tgg aaa tta pat tgt gca tac aga ecc caa age tea ggc 3024 

u ,< --■■= F A-a D ro Gin Ser Ser Gly 

ul> lie Asp i r.p ->s - ^ — - j - 

995 1000 1005 

cag gta gaa aga atg aat aga acc ate aag gag act tta act aaa tta 3072 
Gin Val Glu Arg Met Asn Arg Thr He Lys Glu Thr Leu Thr Lys Leu 
1010 1015 1020 

acg ctt gca act ggc tct aga gac tgg gtg etc eta etc ccc tta gee 3120 
Thr Leu Ala Thr Gly Ser Arg Asp Trp Val Leu Leu Leu Pro Leu Ala 
,025 
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04 5 



ccg ggc ccc cat ggc etc acc c^ ^ 
Pro Gly Pro !hs Gly Leu Thr Pro Tyr 



3 J ^ 



gao ate tta tat qqq qca ccc ccg ccc ctt gta aac ttc cct gac cct 
Glu Lie Leu Tyr Giy Ala Pro Pro Pro Leu Val Asn Phe Pro Asp Pro 
1060 

gac atg aca aga gtt act aac age ccc tct etc caa get cac tta cag 3264 
Asp Met Thr Arg Val Thr Asn Ser Pro Ser Leu Gin Ala Hxs Leu Gin 
1075 1080 l°Bb 

get etc tac tta qtc cag cac gaa gtc tgg aga cct ctg gcg gca gec 3312 
Ala leu Tvr Leu Val Gin Hrs Glu Val Trp Arg Pro Leu Ala Ala Ala 
1090 " 1095 HO 0 

gaa caa ctg gac cga ccg gtg gta cct cac cct tac cga gtc 3360 
Asp Arg Pro Val Val Pro 

mo 1115 



Tyr Gin Glu Gin Leu Asp Arg Pro Val Val Pro Hrs Pro Tyr Arg ^Val 



20 



ggc gac aca gtg tgg gtc cgc cga cac cag act aag aac eta gaa cct 
Gly Asp Thr Val Trp Val Arg Arg His Gin Thr Lys Asn Leu Glu Pro 
1125 1130 1135 

cgc tgg aaa gga cct tac aca gtc ctg ctg acc acc ccc acc gec etc 
Arg Tr P Lys Gly Pro Tyr Thr Val Leu Leu Thr Thr Pro Thr Ala Leu 
1140 

aaa gta gac ggc ate gca get tgg ata cac gec gec cac gtg aag get 
Lys Val Asp Gly He Ala Ala Trp He His Ala Ala Hxs Val Lys Ala 

1165 



34 OH 



1155 II 60 



25 gec gac ccc ggg ggt gga cca tec tct aga ctg aca tgg cgc gtt caa 3552 
. . c_ -,.r Leu Thr Tid Ara Val Gin 

^"ina ' 1180 

cgc tet caa aae cee tta aaa ata agg tta acc cgc gag gee ccc 359' 
^ rq ser Gin Asn Pro Leu Lys lie Arg Lou Thr Arg Glu Ala Pro 
30 1185 H°0 ^ 

taatcccett aattcttctg atgetcagag gggtcagtac tgette 



364 



<210> 41 
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-.211:- 11^ 

<2 12;- PPT 

< 4 0 0 > 4 1 

5 r,i Y Gly Gin Gi y Gin Glu Pro Pro Pro Giu Pro Arg He Thx Leu Lys 

i 5 io 15 

Val Gly Gly Gin Pro Val Thr Phe Leu Val Asp Thr Gly Ala Gin His 
20 25 30 

Ser Val Leu Thr Gin Asn Pro Gly Pro Leu Ser Asp Lys Ser Ala Trp 
10 35 4° 45 

Val Gin Gly Ala Thr Gly Gly Lys Arg Tyr Arg Trp Thr Thr Asp Arg 
50 55 60 

Lys Val His Leu Ala Thr Gly Lys Val Thr His Ser Phe Leu His Val 
65 

15 Pro Asp Cys Pro Tyr Pro Leu Leu Gly Arg Asp Leu Leu Thr Lys Leu 

85 90 95 

Lys Ala Gin He His Phe Glu Gly Ser Gly Ala Gin Val Met Gly Pro 



100 



105 11° 



Met Gly Gin Pro Leu Gin Val Leu Thr Leu Asn lie Glu Asp Glu His 
20 115 120 125 

Arg Leu His Glu Thr Ser Lys Glu Pro Asp Val Ser Leu Gly Ser Thr 



130 



140 



Trp Leu Ser Asp Phe Pro Gin Ala Trp Ala Glu Thr Gly Gly Met Gly 

,r t 160 



25 Leu Ala Val Arg Gin Ala Pro Leu lie lie Pro Leu Lys Ala Thr Ser 
105 no 1_7S 

Thr Pro Val Ser lie Lys Gin Tyr Pro Met Ser Gin Glu Ala Arg Leu 

iS5 1^° 



Gly lie Lys Pro Hi, lie Gin Arq Leu Leu Asp Gin Gly Tie Leu ' 
30 195 200 205 
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A , f! y a . C ;lu Asp He H.is Pro Thr Val Pro Asn Pro Tyr Asn Leu Leu 

250 255 

, er Glv Ij( U i t( . i ro ; -,,. r His Gin Trp Tyr Thr Val Leu Asp Leu Lys 



265 



270 



Asp Ala Phe P:.< Cy.. L-u Arq Leu Has Pro Thr Ser Gin Pro Leu Phe 

275 280 285 

Ala Phe Glu Tip A: g Asp Pro Glu Met Gly lie Ser Gly Gin Leu Thr 

290 2^5 300 

Trp Thr Arg Leu Pro Gin Gly Phe Lys Asn Ser Pro Thr Leu Phe Asp 
305 310 315 320 

Glu Ala Leu His Arq Asp Leu Ala Asp Phe Arg lie Gin His Pro Asp 
325 330 335 

Leu lie Leu Leu Gin Tyr Va] Asp Asp Leu Leu Leu Ala Ala Thr Ser 
340 345 350 

Glu Leu Asp Cys Gin Cln Gly Thr Arg Ala Leu Leu Gin Thr Leu Gly 
355 360 365 

Asn Leu Gly Tyr Arg Ala Ser Ala Lys Lys Ala Gin He Cys Gin Lys 
370 37', 380 

. ,. ' t_.. £ u Glv Gin Ara Trp Leu 

^ i; 11 " ' 7so "" ' 395 

Thr Glu Ala Arg Lys Glu Tr.r Val Met Gly Gin Pro Thr Pro Lys Thr 
405 410 415 

Pro Ara Gin Leu Arg Glu Ph. Lou Gly Thr Ala Gly Phe Cys Arg Leu 
420 425 430 

Trp lie Pro Gly Phe Ala Glu Met Ala Ala Pro Leu Tyr Pro Leu Thr 
435 440 445 
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I/.-e Thr Glv Thr Leu Phe Asn Trp Gly Pro Asp GJ n Gin I.ys Alci Tyr 



465 " 470 475 48U 

5 Asp Leu Thr Lys Pro Phe Glu Leu Phe VaJ Asp Glu Lys Gin Gly Tyr 
485 490 495 

Ala Lys Gly Val Leu Thr Gin Lys Leu Gly Pro Trp Arg Arg Pro Val 
mm 505 510 



Asp Pro Val Ala Ala Gly Trp Pro Pro 



Ala Tyr Leu Ser Lys Lys Leu Asp P 



Cys Leu Arg Met Val Ala Ala lie Ala Val Leu Thr Lys Asp Ala Gly 
530 535 540 

Lys Leu Thr Met Gly Gin Pro Leu Val He Leu Ala Pro Has Ala Val 
545 550 555 560 

15 Glu Ala Leu Val Lys Gin Pro Pro Asp Arg Trp Leu Ser Asn Ala Arg 
565 570 575 

Met Thr His Tyr Gin Ala Leu Leu Leu Asp Thr Asp Arg Val Gin Phe 
580 585 590 



Gly Pro Val Val Ala Leu Asn 



Pro Ala Thr Leu Leu Pro Leu Pro Glu 



20 595 600 60S 

Glu Gly Leu Gin His Asn Cys Leu Asp He Leu Ala Glu Ala His Gly 



1hr Arg Pro Asp Leu Tnr Ar-p oln Pre Leu P : c Asp a Asp Hi -. Thr 
625 630 635 640 

25 Trp Tyr Thr Asp Gly Ser Ser Leu Leu Gin Glu Gly Gin Arg Lys Ala 
645 650 655 

Gly Ala Ala Val Thr Thr Glu Thr Glu Val lie Trp Ala Lys Ala Leu 

660 665 670 

Pro Ala Gly Thr Ser Ala Gin Arg Ala Glu Leu He Ala Leu Thr Gin 
30 675 680 685 
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Met A] a Glu Gl y Lys Lys Leu Asn Val Tyr Thr Asp Sei 



7 0S 710 715 '~° 

5 Arq Gl v Leu Leu Thr Ser Glu Gly Lys Glu Me Lys Asn Lys Asp Glu 
725 730 735 

Tie Leu Ala Leu Leu Lys Ala Leu Phe Leu Pro Lys Arg Leu Ser lie 
740 745 750 



le His Cys Pro Gly His C.I n Lys Gly His Se 



Ala Glu Ala Arg Gly 

765 



10 755 760 

Asn Arg Met Ala Asp Gin Ala Ala Arg Lys Ala Ala lie Thr Glu Thr 
770 775 780 

Pro Asp Thr Ser Thr Leu Leu He Glu Asn Ser Ser Pro Tyr Thr Ser 
785 790 795 800 

15 Glu His Phe His Tyr Thr Val Thr Asp lie Lys Asp Leu Thr Lys Leu 



810 



815 



Gly Ala lie Tyr Asp Lys Thr Lys Lys Tyr Trp Val Tyr Gin Gly Lys 
820 825 830 

Pre Val Met Pro Asp Gin Phe Thr Phe Glu Leu Leu Asp Phe Leu His 
20 835 840 845 

Gin Leu Thr His Leu Ser Phe Ser Lys Met Lys Ala Leu Leu Glu Arg 
850 855 860 

, ~ r . -r.. r Tvr Me - Leu Asn Arg Asp Ara Thr Leu Lys Asn 

B65 ^ 870 875 880 

25 He Thr Glu Thr Cys Lys Ala Cys Ala Gin Val Asn Ala Ser Lys Ser 
885 890 895 

Ala Val Lys Gin Gly Thr Arg Val Arg Gly His Arc, Pro Gly Thr His 
900 905 910 

Trp Glu He Asp Phe Thr Glu He Lys Pro Gly Leu Tyr Gly Tyr Lys 
30 915 920 925 
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u V.,1 t-i.c- T I ft Ar.p Thr Phe Set Gly Trp He GJ u Ala Phe 



t . ro T ,.. , v .. .., JU Tiu Ala Lys Vai Val Tnr Lys Lys i.e. Leu G - * 

94s 950 ?55 9o<: 

5 Glu He Phe ho Arg Phe Gly Mel Pro Gin Val Leu Gly Thr Asp Asn 

965 970 975 



Gly Pro Aid PL- Val Ser Lys Val Ser G 



In Thr Val Ala Asp Leu Leu 



985 990 

Gly He Asp Tip l.ys Leu His Cys Ala Tyr Arg Pro Gin Ser Ser Gly 
10 995 1000 1005 

Gin Val Glu Arc, M-t Asn Arg Thr lie Lys Glu Thr Leu Thr Lys Leu 
1010 1015 1020 

Thr Leu Ala Thr Gly Ser Arg Asp Trp Val Leu Leu Leu Pre Leu Ala 
1025 1030 1035 1040 

15 Leu Tyr Arg Ala Arg Asn Thr Pro Gly Pro His Gly Leu Thr Pro Tyr 
1045 1050 1055 

Glu lie Leu Tyr Gly Ala Pro Pro Pro Leu Val Asn Phe Pro Asp Pro 
1060 1065 1070 

Asp Met Thr Arg Val Thr Asn Ser Pro Ser Leu Gin Ala His Leu Gin 



Ala Leu Tyr Leu Val Gin His Glu Val Trp Arg Pro Leu Ala Ala Ala 
1090 1095 HOO 

,. , .., 0 ._ , „ u i£ „ r.. z ? Pro Va! Y<al Pro His Pro Tvr Arg Val 
1105 1110 1H5 1120 

25 Gly Asp Thr Val Trp Val Arg Arg His Gin Thr Lys Asn Leu Glu Pro 
1125 H30 H3 lj 

Arg Trp Lys Gly Pro Tyr Thr Val Leu Leu Thr Thr Pro Thr Ala Leu 
1140 H4 5, H50 

Lys Val Asp Gly He Ala Ala Trp He His Ala Alii His Val Lys Ala 
30 1155 H60 H65 
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Ser Arg Leu Thr Trp Arg Val Gin 



,i:g Leu thr . 
119b 



I u Ax a P r 



2709 
UNA 

Human immunodeficiency virus type 1 



10 . :m 



.,tc, -t. gqq gga att gga ggt ttt ate aaa gta aga cag tat gat cag 
„..t II, Gly Gly He Gl y Gly Phe lie Lys Val Arg Gin Tyr Asp Gin 



ltJ rtc ata gaa ate tqt gga cat aaa get ata ggt aca gta tta gta 96 
I.,u lie Glu lie Cys Gly His Lys Ala He Gly Thr Val Leu Val 



qt e aac ata att gga aga aat ctg ttg act cag att 144 
sn He Tie Gly Arg Asn Leu Leu Thr Gin He 



Pro Thr Pro Val A 



v ~ act tta aat ttt ccc att agt cct att gaa act gta cca gta 
-,. Thr Leu Asn Phe Pro lie Ser Pro He Glu Thr Val Pro Val 



^ „„ Ccd ., ^ ^ ^ q 3 c aaa gtt aaa ena tgq cca ttg 240 

\Z i.l's Pro Gly Met Asp Gly Pro Lys Val Lys Gin Trp Pro Leu 
7 0 /b 

(J ,n aaa ata aaa gca tta gta gaa att tgt aca gaa atg gaa 288 
77 77 Glu Lyl lie Lys Ala Leu Val Glu lie Cys Thr Glu Met Glu 

30 eb 90 

«aa ggg aaa att tea aaa att ggg cct gaa aat cca tac aat act 336 
L y , Giu Gly Lys lie Ser Lys lie Gly Pro Glu Ann Pro Tyr Asn Thr 
ms HO 
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120 12 5 



gta gat t:: .uja gaa ctt aat aag aga act caa gac ttc tgg cjaa gtt 432 
Val Asp Phe Arg Glu Leu Asn Lys Arq Thr Gin Asp The Trp Glu Val 
130 135 140 

caa tta gga ata cca cat ccc gca ggg tta aaa aag aaa aaa tea gta 480 
Gin Leu Gly He Pro His Pro Ala Gly Leu Lys Lys Lys Lys Ser Val 
145 150 155 160 

aca gta ctg gat gtg ggt gat gca tat ttt tea gtt ccc tta gat gaa 528 
Thr Val Leu Asp Val Gly Asp Ala Tyr Phe Ser Val Pro Leu Asp Glu 
165 "0 l 75 

gac ttc agg aag tat act gca ttt acc ata cct agt ata aac aat gag 576 
Asp Phe Arg Lys Tyr Thr Ala Fhe Thr He Pro Ser He Asn Asn Glu 
180 185 19° 

aca cca ggg att aga tat cag t.ac aat gtg ctt cca cag gga tgg aaa 624 
Thr Pro Gly He Arg Tyr Gin Tyr Asn Val Leu Pro Gin Gly Trp Lys 

205 



195 



gga tea cca gca ata ttc caa agt age atg aca aaa ate tta gag cct 
20 Gly Ser Pro Ala lie Phe Gin Ser Ser Met Thr Lys He Leu Glu Pro 



210 



220 



ttt aga aaa caa aat cca gac ata gtt ate tat caa tac atg gat gat 
Phe Arg Lys Gin Asn Pro Asp He Val He Tyr Gin Tyr Met Asp Asp 
225 230 



235 240 



672 



720 



25 ttg tat gta gga tct gac tta gaa ata ggg cag cat aga aca aaa ata 768 
Leu lyr Val Glv SeI Asp L eu 31- He Hy Gir His Arg Thr lys He 
245 250 255 

gag gag ctg aga caa cat ctg ttg agg tgg gga ctt acc aca cca gac 816 
Glu Glu Leu Arg Gin His Leu Leu Arq Trp Gly Leu Thr Thr Pro Asp 
30 260 265 270 

aaa aaa cat cag aaa gaa cct cca ttc ctt tgg atg ggt tat gaa etc 864 
Lys Lys His Gin Lys Glu Pro Pro Phe Leu Trp Met Gly Tyr G] u Leu 

275 280 285 
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cit cat gut oaa tgg 



gtg ctq 



ro Asp l.ys Trp Thi Val Gin Pro lie Val Leu Pro Glu Lys Asp 



gtc aat gcic ata cag aaq tta gtg gga aaa ttg aat. tgg 
Val Asn Asp 11c Gin Lys Leu Val Gly Lys Leu Asn Trp 



gca agt cag att tac cca ggg att aaa gta agg caa tta tgt aaa etc 
Ala Ser Gin lie Tyr Pro Gly He Lys Val Arg Gin Leu Cys Lys Leu 



ctt aga gga acc aaa gca eta aca gaa gta ata cca eta aca gaa ^ 
Leu Arg Gly Thr Lys Ala Leu Thr Glu Val lie Pro Leu Thr Glu Glu 
340 345 350 

qca gag eta gaa etg gca gaa aac aga gag att eta aaa gaa cca gta 1104 
Ala Glu Leu Glu Leu Ala Glu Asn Arg Glu He Leu Lys Glu Pro Val 
355 360 365 

eat gga gtg tat tat gae cea tea aaa gac tta ata gca gaa ata cag 1152 
His Gly Val Tyr Tyr Asp Pro Ser Lys Asp Leu He Ala Glu He Gin 
370 375 380 

aag cag ggg caa gge caa tgg aca tat eaa att tat caa gag cca ttt 1200 



Lys Gin Gly Gin Gly Gin Trp Thr Tyr Gin lie Tyr Gin Glu 



Phe 



385 390 



395 



aat ctg aaa aca gga aaa tat gca aga acq agg ggt gee eac act 1248 

Asn Leu Lys Thr Gly Lys Tyr Ala Arg Thr Arg Gly Ala His Thr 
405 410 415 

gat gta aaa caa tta aca gag gca gtg caa aaa ata acc aca gaa 1296 

...... • .. -k r Glu Ala Val Gin j.yr- He Thr Thr Glu 



420 



425 



4 30 



ata gta ata tgg gga aag act cct aaa ttt aaa eta ccc ata caa 
He Val He Trp Gly Lys Thr Pro Lys Phe Lys Leu Pro He Gin 
435 4 40 445 



aaq gaa a ca tgg gaa aca tgg tgg aca gag tat tgg eaa gee acc tgg 
Lys Glu Thr Trp 



Glu Thr Trp Trp Thr CLu Tyr Trp Gin Ala Thr Trp 



4 50 



455 



460 
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tea ale aaa tta tqq 14 4 0 

.1 t l a..u tgg aag ttt gtc aat av_'_ <-■_ ^~ ^ , J 

n ■ ,, T r; j u rhe '.'a 1 As i! Thr Pro Pro Leu Val I.y:> Leu Trp 

47 n ' 4~?5 480 

t ., , ui gag ...a gaa ccc ata qta gga qca gaa acq ttc tal gta 1488 

TVI ran i,h, Glu Lys Glu Pro lie Val Gly Ala Glu Thr Phe Tyr Val 



4 8 0 



4 90 



age agg gag act aaa tta gga aaa gca gga tat gtt 1536 
Ser Arg Glu Thr Lys Leu Gly Lys Ala Gly Tyr Val 



SOU 505 510 

10 ac t a. >t aqa gga aga caa aaa gtt gtc acc eta act gac aca aca aat 1584 
Thr A:.n Arc] Gly Arg Gin Lys Val Val Thr Leu Thr Asp Thr Thr Asn 
M S 520 52 5 

caq aaq act gag tta caa gca att cat eta get ttg cag gat teg gga 1632 
Gin Lys Thi Glu Leu Gin Ala lie His Leu Ala Leu Gin Asp Ser Gly 
15 530 535 540 

tta qaa ata aat ata gta aca gac tea caa tat gca tta gga ate att 1680 
leu Glu Val Asn lie Val Thr Asp Ser Gin Tyr Ala Leu Gly He He 
b '<, 550 555 560 

caa qca caa cca gat aaa agt gaa tea gag tta gtc aat caa ata ata 1128 
20 Gin Ala Gin Pro Asp Lys Ser Glu Ser Glu Leu Val Asn Gin He He 
565 570 575 

gag cag tta ata aaa aag gaa aag gtc tat ctg gca tgg qta cca gca 1776 
Glu Gin Lou He Lys Lys Glu Lys Val Tyr Leu Ala Trp Val Pro Ala 
580 585 590 

25 cc gga att gga gga aat gaa caa gta gat aaa tta gtc agt get 1824 

b9 l 600 605 

qaa a-- agg aaa gta eta ttt tta gat gga ata gat aag gec caa gat 1872 
Gly He Arg Lys Val Leu Phe Leu Asp Gly He Asp Lys Ala Gin Asp 
30 CK. "5 620 

t gag aaa tat cac agt aat tgg aga gca atg get agt gat ttt 1920 

Phe 
640 



qaa cat gag aaa tat cac agt aat tgg aga yua OL . H 

Glu His Glu I.ys Tyr His Cer Asn Trp Arg Ala Met Ala Ser Asp Ph 



630 635 



PCT/US00/00896 



aaa qaa ata gta gcc age tgt gat ad 
l.vs Glu He Va] Ala Ser Cys Asp 1^ 



ta aaa ggn gaa gcc atg cat gga caa gta gac tgt agt cc 
eu Lys Gl y Glu Ala Met His Gly Gin Val Asp Cys Ser E'r 
660 ^ 670 



tgg caa eta gat tgt aca cat tta gaa gga 



gtt ate ctg 2064 



Gly He Trp Gin Leu Asp Cys Thr His Leu Glu Gly Lys Val He Leu 
675 680 60S 

gta qca gtt cat gta gcc agt gga tat ata gaa gca gaa gtt att cca 

Val Ala Val His Val Ala Ser Gly Tyr He Glu Ala Glu Val He Pro 
690 695 



700 



gea gaa aca ggg cag gaa aca gca tac ttt ctt tta aaa tta gca gga 
Ala Glu Thr Gly Gin Glu Thr Ala Tyr Phe Leu Leu Lys Leu Ala Gly 



705 710 



715 720 



aga tgg cca gta aaa aca ata cat aca gac aat ggc age aat ttc acc 

Arg Trp Pro Val Lys Thr He His Thr Asp Asn Gly Ser Asn Phe Thr 
725 730 735 

agt act acg gtt aag gcc gcc tgt tgg tgg gcg gga ate aag cag gaa 

Ser Thr Thr Val Lys Ala Ala Cys Trp Trp Ala Gly He Lys Gin Glu 
740 745 750 



ttt gga att ccc tac aat ccc caa agt caa gga gta gta gaa tct atg 
Phe Gly He Pro Tyr Asn Pro Gin Ser Gin Gly Val Val Glu Ser Met 
755 760 765 



g get gaa 2352 



gaa tta aag aaa att ata ggc cag gta aga g 

He Glv Gin Val Ara Asp Gin Ala Glu 



770 775 780 

cat ctt aag aca gca gta caa atg gca gta ttc ate cac aat ttt aaa 2400 

His Leu Lys Thr Ala Val Gin Met Ala Val Phe I\v Ills Asn Phe Lys 

785 790 795 800 

aqa aaa ggg ggg att ggg gqg tac agt gca ggg gaa aga ata qta gac 2448 

Arg Lys Gly Gly He Gly Gly Tyr Ser Ala Gly Glu Arg He VaJ Asp 

805 810 815 



BNSDOCID <WO 004Z199A1 



WO 00/42199 



- 82 - 



PCT/USOO/00896 



atl , ate; gca aca gac ata caa act aaa gaa t.ta caa aaa caa o,u 

n , Jie Ala T.-u Asp lie Gin Thr i.yr> Glu Leu Gin Lys Gin lie Thr 

820 82b 830 

ci aa att. caa aat ttt cqg gtt tat tac agg gac age aqa gat cca ctt 254 4 

5 l.ys He Gin Asn Phe Arg Val Tyr Tyr Arg Asp Ser Ar:g Asp Pro Leu 

835 840 845 



tag aaa gga cca gca aag etc 



tgg aaa ggt gaa ggg gca gta gta *.b-^ 



Trp Lys Gly Pro Ala Lys Leu Leu Trp Lys Gly Glu Gly Ala Val Val 



ata caa gat aat 

lie Gin Asp Asn Ser Asp lie Lys Val Val Pro Arg Arg Lys Ala Lys 
865 870 875 880 

ate att agg gat tat gga aaa cag atg qca ggt gat gat tgt gtg gca 26 
lie lie Arg Asp Tyr Gly Lys Gin Met Ala Gly Asp Asp Cys Val Ala 
885 890 895 



agt aga cag gat gag gat tag 
Ser Arg Gin Asp Glu Asp 
900 



<210> 43 

20 < 21 i > 902 

<-212> PRT 

<213> Human immunodeficiency virus type 1 



<4 00> 4 3 

Met lie Gly Gly lie Gly Gly Phe lie Lys Val Arg Gin Tyr Asp Gin 
5 10 15 



25 i 5 

Ala lie Gly Thr Val Leu Val 

20 



He Leu lie Glu lie Cys Gly His Lys 

25 30 



Gly Pro Thr Pro Val Asn lie lie Gly Arg Asn Leu Leu Thr Gin He 
35 40 45 

30 Gly Cys Thr Leu Asn Phe Pro lie Ser Pro lie Glu Thr Val Pro Val 
50 W 60 

Lys Leu Lys Pro Gly Met Asp Gly Pro Lys Val Lys Gin Trp Pro Leu 



BNSDOCID <WO 0042199A1 I 
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Glu Gly i.ys lie Ser Lys lie Gly Pro Glu Asn Pro Tyr Asn Thr 

5 

115 

130 

,0 Gin Lou Gly lie Pro His Pro Ma Gly Lou Lys Lys Lys Lys Ser Val 
145 1^0 1" 

Thr Val Leu Asp Val Gly Asp Ala Tyr Phe Ser Val Pro Leu Asp Glu 



!65 



Asp Phe Arg Lys Tyr Thr Ala Phe Thr He Pro Ser He Asn Asn Gl 



15 180 



185 



195 



210 



215 



20 Phe Are, Lys Gl„ «n Pro Asp Ue v al Ue Tyr cm Tyr He, Asp Asp 

225 230 235 

Lo „ Tyr V.1 Gly Ser As P Leu Glu U. Cly Gin Hi. Arc. Thr Lys He 

-, PS 270 

25 260 

Lys Lys His Gin Lys Glu Pro Pro Phe Leu Tr P Met Gly Tyr Glu Leu 

275 280 

3on 



295 



BNSDOCID <WO 0042199A1 
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lie C.'i n Lys Leu Vol Ciy Lys 



VL - Ere Gly lie Lys Val Arg Gin L-,u C y, i.ys ku 
oc 330 



5 Leu Arg Gly Thr Lys Ala Leu Thr Glu Val lie Pro Leu Thr Glu Glu 
340 34 5 350 

Ala Glu Leu Glu Leu Ala Glu Asn Arg Glu He Leu Lys Glu Pro Val 
355 360 365 

His Gly Val Tyr Tyr Asp Pro S,,r Lys Asp Leu lie Ala Glu He Gin 
10 370 375 380 

Lys GU> Gly Gin GJ y Gin Trp Thr Tyr Gin lie Tyr Gin Glu Pro Phe 
385 



s Tyr Ala Arg Thr Arg Gly Ala His Thr 

405 



Lys Asn Leu Lys Thr Gly Lys Ty: 

410 415 



15 Asn Asp Val Lys Gin Leu Thr Glu Ala Val Gin Lys lie Thr Thr Glu 
420 425 430 

Ser lie Val lie Trp Gly Lys Thr Pro Lys Phe Lys Leu Pro He Gin 
435 440 445 

Lys Glu Thr Trp Glu Thr Trp Trp Thr Glu Tyr Trp Gin Ala Thr Trp 
20 4 50 4 55 



460 



He Pro Glu Trp Glu Phe Val Asn Thr Pro 



Pro Leu Val Lys Leu Trp 

475 480 



465 470 

„, . . .. o „, , r ..,„ T ,_ vH Glv Ala Glu Thr Phe Tyr Val 



490 



495 



25 Asp Gly Ala Ala Ser Arg Glu Thr Lys Leu Gly Lys Ala Gly Tyr Val 

505 510 



Thr Asn Arg Gly Arg Gin Lys Va! Val Thr 

515 520 



Thr Asp Thr Thr As: 



Gin Lys Thr Glu Leu Gin Ala He His Leu Ala Leu Gin Asp Ser Gly 
30 530 "5 540 



I'CT/USOO/00896 

WO 00/42 199 - 85 - 

..... , ;; . v.,) Thr Asp Ser Gin Tyr Ala Leu Gly Tie lie 



,u Val Asn Gin lie He 
5 7 5 



5 Glu Gin Leu lie Lys Lys Glu Lys Val Tyr Leu Ala Trp Val Pro Ala 

585 590 

His L y s Giy II. • My Oly Asn Glu Gin Val Asp Lys Leu Val Scr Ala 
b J. 600 60b 

Gly lie Arc, Ly, V.. I •" -u Phe Leu Asp Gly He Asp Lys Ala Gin Asp 
10 610 615 620 



Glu His Glu Ly: 



Tyr Hi:. -Ser Asn Trp Arg Ala Met Ala Ser Asp Phe 



635 



640 



Asn Leu Pro Pro Val Val Ala Lys Glu lie Val Ala Ser Cys Asp Lys 
M<. 6^° 6 " 

15 Cys Gin Leu Lys Gly Glu Ala Met His Gly Gin Val Asp Cys Ser Pro 



Gly lie Trp Gin Leu Asp Cys Thr His Leu Glu Gly Lys Val lie Leu 
675 680 685 

Val Ala Val His Val Ala Ser Gly Tyr He Glu Ala Glu Val He Pro 
20 690 695 ™0 



Glu Thr Gly Gin Glu Thr Ala Tyr Phe Leu 



Leu Lys Leu Ala Gly 



710 



Gly Ser Asn Phe Thr 



wi, Ala Ala Cys Trp Trp Ala Gly He Lys Gin Glu 



25 Ser Thr Thr Val Ly 

7,0 745 

Phe Gly lie Pro Tyr A,n Pro Gin Ser Gin Gly Val Val Glu Ser Met 
755 

Asn Lys Glu Leu Lys Lys He lie Gly Gin Val Arg Asp Gin Ala Glu 

^ 770 -H5 ™0 



PCT/US00/0O896 

WO 00/42199 - 86 - 

nnc, 8 00 



805 810 815 

5 11, lie Ala Thr Asp He Gin Thr Lys GLu Leu Gin Lys Gin He Thr 

820 B25 830 

Lys lie Gin Asn Phe Arg Val Tyr Tyr Arq Asp Ser Arc, Asp Pro Leu 



ys Gly Pro Ala Lys Leu Leu Trp Lys Gly Glu Gly Ala Val Val 
10 B50 



Trp Lys Gly 

855 860 



Val Val Pro Arg Arg Lys Ala Lys 

870 



lie Gin Asp Asn Ser Asp He Lys 

875 880 



lie lie Arg Asp Tyr Gly Lys Gin Met Ala Gly Asp Asp Cys Val Ala 



15 Ser Arg Gin Asp Glu Asp 
900 



<210> 4 4 

<400> 44 
000 

20 <210> 4 5 

<211> 62 

<2 12> DNA 

<213> Artificial Sequence 



<220> 

25 Description of Artificial Sequence: synthetic DNA 

- RMlBAhisXhol extend 



accccgatca atccgctcga gttagtgglg gtggtggtgg tgtttagcct ctctcaaggg 60 



BNSDOCID <WO 0G42199A1 



WO 00/42199 



- 87 - 
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- : /.Hid.-jal Sequence 
5 < 2 2 ■') ■• 

, 2 ; 3 . „e S c,iption of Artificial Sequence: synthetic DNA 
- RM1BA His 10 

<4 C 0> 4 6 

accccgatca atccgctcga gttagtggtg gtggtggtgg tggtggtggt ggtgtttagc 60 

74 

10 ctctctcaag ggat 



<210> 47 

<2il> 80 
s ■ 2 1 2 > DNA 

<:213> Artificial Sequence 
15 <220> 

<223> Description of Artificial Sequence: synthetic DNA 
- RM1BA His 12 

<400> 47 

accccgatca atccgctcga gttagtggtg gtggtggtgg tggtggtggt ggtggtggtg 60 

80 

20 tttagcctct ctcaagggat 



<210> 4 8 
•:211> 61 
•;2 12> DNA 



25 <2 2 0> 

■;223> Description of Artificial Sequence: synthetic DMA 
- RM1BA Leu 

<4U0> 48 

accccgatca atccgctcga gttagaggag tagtaggagt agtttagccr. ctctcaaggg 60 



WO 00/42 1 ( >9 



- 88 - 



PCT/US00/00896 



-.'J I 1 • C 1 



icia.1 Sequence: 



synthetic DMA 



acc';cgltca at.^.-t ■ qttacttctt cttcttcttc tttttagcct ctctcaaggg 60 



<210> 50 

<211> 62 

<212> DNA 

<:21i> Artificial Seque 



RM1BA Arq6 



<400> 50 

accccgatca atccgctcga gttaacgacg acgacgacga cgtttagcct ctctcaaggg 



<210> 51 
<211> 65 
<212> DNA 

•:213> Artificial Sequence 
25 <220> 

■:222> Description of Artificial Sequence: synthetic DNA 
- RM1BA Arg3X4 



<4G0> 51 

accccgatca atccgctcga qttaacgatt acgattacgc tgatatt.tag cctctctcaa 60 



30 ggqat 



BNSDOCID- <WO__ 0042199A1 



WO 00 ,'42 199 



- 89 - 
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;;alrtlon ot Artificial Sequence: synthetic DMA 
RM1BA Asp 6 



ccctc.a gttaatcgtc gtcatcgtca tctttagcct ctctcaaggg 60 



<?A? • ON A 

<2\y.- ArUtKidl Sequence 



15 i«cri,.tion ot Artificial Sequence: synthetic DNA 

- RM1BA Asp4 



qd gttaatcgtc atcgtcttta 



cctctctca aggga 



20 ■:•](>■ 'j -i 
• :■ i : .• [niA 



..-script-ior, ot Artificial Sequence: synthetic DNA 
RM1HA Asp r ' 

uca atccgct.cca gttaatcgtc atcgtcatct ttagcctctc tcaaggga 58 



30 <2ii> <n 

<2i2> DNA 



WO 00/42199 



- 00 - 



PCT/US00/O0896 



ac'ccgatca atccgctcga gttaatcgtc gtcatcgtca tcgtcatctt tagcctctct 60 



• :• ! ) 5 6 

<:u> 7 9 

10 <212> DMA 

<213> Artificial Sequence 



- RM1BA Aspl2 
15 <400> 56 

accccgatca atccgctcga gttaatcgtc gtcatcgtca tcgtcatcgt catcgtcatc 60 

79 

tttagcctct ctcaaggga 



<210> 57 

> 61 

20 <212> DNA 

<21Z> Artificial Sequence 

<220> 

<223> Description of Artificial Sequence, syntr.cti- VAt- 
- RM1BA Glu6 Xhol 

25 -:4 00> 57 

accccgatca atccgctcga gttattcctc ttcctcttcc tctttagcct ctctcaaggg 60 



..210> 58 
<211> 79 



BNSDOCID :^WO.___004219SA1 I > 



WO 00/42199 



-91 - 



PCT/US00/0O896 



<4 00 - 58 

ac:c.:gatca atccgctcga gttattcct.c ttcctcttcc 
tttagcctct ctcaaggga 



<210> 59 
10 < 2 1 1 > 4 4 

<2 12> DNA 

<213> Artificial Sequence 
<22D> 

<22i> Description of Artificial Sequence: synthetic DNA 
15 - RM1BA Xhol 

<4 00> 59 

accccgatca atccgctcga gttatttagc ctctctcaag gqat 



<2 10> 60 
<211> 63 
20 <2 12> DMA 

<2]3> Artificial Sequence 



<2^3> Description of Artificial Sequence: synthetic: DNA 
- RMibK 6,2 0 his 



25 <4 00> 60 

accccgatca atccgctcga gttagtggtg gtggtggtgg tgggcctcca acgcaggggc 



<:io> 6i 

<211> 64 
30 <212> DNA 



WO 00/42199 



- 92 - 



PCT/US00/00S96 



quonce: synthetic VI'..-- 



dL , cg ., t , qa g tLa cagttt ctgtgtttca gcctgtttag cctctctcaa 60 



10 ■ I I '.' ■ DMA 

1 j ■ Art J Ucia) Seque 



Description of Artificial Sequence: synthetic DNA 
- RM1BA LZIP 3 Xhol 



1*> - AVOs 62 

dC ,,,„ tCJ atccgctcga gttaaagcag gtcaatctca gagatcagtt tctgtgtttc 



.iq.rtqttta gcctctctca aggga 



IV • DMA 

1 i • Arti ticidl Sequence 



- RMiBA LZIP 4 Xhol 
2? ^nn • bs 



tccq'.-li.-q 



tc:q.. gttacagctg ctcgttctgt ttacgaagca ggtcaatctc 



<210> 64 
<211> 87 



WO 00/42199 



PCT/US00/00896 



I ---rip" 1 on ol Artificial Sequence: synthetic DMA 
RM1BA LZ.IP 5 Xhol 



<400> 64 

accccqatca atccgctcga gttacagctg ttccagcttg tgtttcagct gctcgttctg 



ttta-qaaqc aggt 



:aatct cagagat 



<210> 65 
10 <211> 49 
<2 12> UNA 

<213> Artificial Sequence 



<223> Description of Artificial Sequence: synthetic 
15 - KMiBA Cyst 2 



<4 00> 65 

accccqatca atccgctcga gttagcaaca tttagcctct ctcaaggga 



<210> 66 
<211> 61 
20 <212> DNA 



213> Artificial Sequence 



<220> 

, :r ,.. Description of Artificial Sequence: synthetic DNA 



^ Iccccgla atccgctcga gttagcaaca gcaacagcaa catttagcct ctctcaaggg 60 



30 



• 210> 67 
-.211> 55 
<212> DNA 



WO 00/42199 



- 94 - 



PCT/US00/00896 



aclccgLca atccgctcga gttaaggacg aggaccttta gcctctctca aggga 



<: i: 1 0 > 68 

•:.::.> 55 

<2L2> dna 

10 <213> Artilicial Sequence 



<223> Description of Artificial Sequence: synthetic DNA 
- RM1BA PRPG 

<400> 68 

15 accccgatca atccgctcga gttaaccagg acgaggttta gcctctctca aggga 



<210> 69 
<211> 73 
<212> DNA 

<213> Artificial Sequence 



<223> Description of Artificial Sequence: synthetic : 
- RM1BA WH 

<400> 69 

ccgatcaatc cgctcgagtt atgcdcgag:. atgtg~~g- ■ 



25 ctctctcaag gga 



•:210> 70 
<21 1> 70 
<212> DNA 

<213> Artificial Sequence 



MSDOCiD. <WO 0042199A1 



WO 00/42199 



- 95 - 
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Description of Artificial Sequence: synthetic DI1A 
- RMiBA 3 PPG Xhol 



laccaqq aggaccagg< 



qgaccaggag gtttagc 



5 tctcaaggga 



10> 71 
11> 6.1 
12> DMA 

13> Artificial Sequence 



Description of Art 
- RM1BA TRP 



Lficial Sequence: synthetic UNA 



<400> 71 

accccgatca atccgctcga gttaccacca 



ccaccaccac ca 



tttagect ctctcaaggg 



<210> 72 
<2 11> 66 
<212> DNA 

<213> Artificial Sequence 
<220> 



ataaggg 
25 ctggct- 



|3> Description of Artificial Sequence: synthetic DNA 
- FM1BA Nhis Smal 

gl tctccc egggatgeae caccaccacc accacactgt tgegctacat 60 



DNA 

Arti f ic 



BNSDOCID:<WO _0042199A1 
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- 96 - 
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r-j.fic.ial Scqu 



ynlhetic DNA 



n .:cactg ttctccccgg gatggctcgt gcatgggaag ctgcacatgc tcgtgcaact 



gttgcqctac atctg 



< 2 1 0 > 7 4 

<, j L> 42 

<-212> DNA 

10 cZ'13> Artificial Sequence 



<223> Description of Artificial Sequence: synthetic DNA 
- RM1BK 62 0 

<4Q0> 7 4 

15 accccgatca atccgctcga gggcctccaa cgoaggggct ga 



'210> 75 
<4 00> 7 5 
000 

<210> 76 
20 --211> 4 5 
<-212> DNA 

<213> Artificial Sequence 

<223> Description or Artiucxox icquer.ce: :.yr.t.,e 
25 - RM1BK 640 Xhol 

•■:400> 76 

gcaatgtatg ccctcaatct cgagaagtgt aaagtctgtc tgcca 



■•.210> 77 

2 1 1 > 4 7 

30 •-212> DNA 

<213> Artificial Sequence 



WO 00/42199 



- 97 - 
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Art.it lcial Sequence 



RK1BK 0 60 Xhol 



ctcaatct egagtatege egatgaggeg gtatcca 



<210> 7 8 

<::ll> 4 4 

<212> LiHA 

<213:> Artificial Sequence 

10 <::20> 

<223> Description of Artificial Sequence: synthetic DMA 
- RM1BK 680 Xhol 

<400> 78 

gcaatgtatg ccctcaatct egagagcegt ggcccaatga tgtt 



15 <210> 79 
<211> 4 5 
<212> DNA 

<213> Artificial Sequence 
<220> 

20 <223> Description of Artificial Sequence: synthetic DNA 
- RM1BK 760 Xhol 

<400> 79 

gcaatgtatg ccctcaatct cqagcagttc cccctgtttg ctggt 

< 2 1 0 > 8 0 
25 <211> 46 
• 2 12:- DNA 

< 2 1 3 :- Artificial Sequence 



<22 0> 



30 



Description of Artificial Sequence: synthetic DNA 
- RM1BK 800 Xhol 



PCT/US00/00896 

WO 00/42199 . 98 . 

g:aatgtatq ccctcaatct cgagtxgiat tttaaccagq ggtcct 4 6 



•■:212 • DNA 
5 <213> Artificial Sequence 

-'220> 

<223> Description of Artificial Sequence: synthetic DNA 
- RM1BK 640 His Xhol 

ol00> 81 

10 gcaatgtatg ccctcaatct cgagttaatg gtqatggtga tggtgaagtg taaagtctgt 60 



<210> 82 

<211> 68 

<212> DNA 

15 <213> Artificial Sequence 

<220> 

<223> Description of Artificial Sequence: synthetic DNA 
- RM1BK 660 His Xhol 

<400> 82 

20 qcaatgtatg ccctcaatct cgagttaatg gtgatggtga tggtgtatcg ccgatgaggc 60 



<2il> 6S 
<212> DNA 

<2i3> Artificial Sequence 
<220> 

<223> Description of Artificial .Sequence: synthetic DNA 
- RM1BK 680 His Xhol 

<400> 83 

gcaatgtatg ccctcaatct cgagttaatg gtgatggtga tggtgagccg tggcccaatg 60 



BNSDOCID. <WO 00421! 



PCT/USOO/00896 

WO 00/42199 . no . 



<211> 67 
<? 1 DMA 

.. 21.<> Artificial Sequence 



<2J3 ■ Description of Artificial Sequence: synthetic DMA 
- RM1BK 800 His Xhol 



4 00> 8 4 

gaqttaatg gtgatggtga tggtgtcgta ttttaaccgg 60 



aatgtatg ccctcaatct cga 



<210> 8 5 
<211> 45 
<212> DNA 

<l:13> Artificial Sequence 



23> Description of Artificial Sequence: synthetic 
- F Cint Xhol 



<400> 85 

20 gcaatgtatg ccctcaatct cgagcacttt gagcgtggtg aaaac 



<2 10> 8 6 

<:-ii> 51 

<212> DNA 

<2I3> Artificial Sequence 



25 <220> 

of Artificial Sequence: synthetic DNA 



<22 3> Description 

- U Cint Sail 



<4 00> 8 6 

gcctgcaaaa agagggctcg cgtcgaccg< 



:tttc ttagtcacct c 



51 



WO 00/42 199 



- 1 00 - 



PCT/US00/00896 



iptjon of Artificial Sequence: synthetic DNA 
nt His Sail 



<40C> 07 

gcctqcihi.i.i ,i i.i-|.ifjct eg cgtcgactta gtggtggtgg tggtggtgcg cctcatct.tt 60 



<210> 8 8 
<211 > 45 
<212> DNA 

<213> Artilici.il Sequence 
15 <220> 

<223> Description of Artificial Sequence: synthetic DNA 
- F Cint 731 Sail 

<400> 88 

gcaatgtatg ccctcaatgt cqacgccaac cggctcctga aagat 45 



20 <::io> 89 

<211> 45 
<2 12> DNA 

<213> Artificial Sequence 

25 <223> Description of Artificial Sequence: synthetic DNA 
- FCint751 Sail 

<400> 89 

gcaatgtatg ccctcaatgt cqacatgaaa agaatcccca ccagc 



<210> 90 
30 <211> 51 
<2 12> DNA 



W O 00/42199 



- 101 - 



PCT/USOO/00896 



nch>u:c DKP 



aa ugagggctrg cctcgagtta cttatcagtg tccctgtttt t 



[IN A 



10 ■ :: 1 i • Artificial Sequence 

<22i> Description of Artificial Sequence: synthetic DNA 
- RCmt 830 his Xhol 

<4 00.-> 91 

15 gertgeaaaa agagggctcg cctcgagtta atggtgatgg tgatggtgct tatcagtgtc 60 



<2il' •■ DNA 
20 .:;:i.i> Artificial Sequence 



i. Description of Artificial Sequence: synthetic DNA 
FDNPCR1 (D4 r j0M) 

- 4 00.- 9? 

tc, -qggacc cactgtcttt. actaacgcct cctcaagcac ccataa 



DNA 

Artitici 



BNSDOCID <WO 0CM2199A1 
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WO 00/42199 - 102 - 



Description of Art L 
- RDIIPCH1 (n4 50H) 



syntn 



ttatgggtgc ttgaggaggc gttagtaaag acagtgggtc ccggca 



5 <2 10> 94 
< 2 1 1 > 46 
<212> DNA 

<213> Artificial Sequence 
< 2 2 0 > 



10 <22 3:> Description of Artificial Sequence: synthetic DNA 
- FDNPCR2 (D505N) 



<400> 94 

acgcccacta atgtagtgac taactccgcg tttgttgcga aaatgt 



<210> 95 
15 <211> 46 
<212> DNA 

<213> Artificial Sequence 

<223> Description of Artificial Sequence: synthetic DNA 
20 - RDNPCR2 (D505N) 



<400> 95 

acattttcgc aacaaacgcg gagttagtca ctacattagt gggcgt 



<210> 96 

<211> 4 8 

25 <:212> DNA 

<213> Artificial Sequence 



<223> Description of Artificial Sequence: synthetic 
- FM1BA E4 8 4Q 

30 <400> 96 

ttgggqgcaa gtgtacaaca actgcaagca cgcqctgtgg ccatggca 



BNSDOCID: <WO. __0042199A1 



PCT/USOO/00896 



■;21o- Artificial Sequence 

.;213, Description of Artificial Sequence: synthetic DMA 
- RM1BA E4 84Q 

<400'> 97 

tgccatggcc acagcgcgtg cttgcagttg ttgtacactt gcccccaa 



<210> 98 
< 2 1 1 > 102 
<212> DNA 

<213> Artificial Sequence 



<223> Description of Artificial Sequence: synthetic DNA 
- RM1BA NS1 Xhol 



<400> 98 
acqatcaatc 



;g ctc g agtt attttgtttg gatacgtcca cgttttgttt cttgtgcggt 



etc tctcaaggga ta 



aeggtt gtttctactg ttttagc 



<2 10:> 99 
< 2 1 1 > 4 5 
<2 12> DNA 

--21 3> Artificial Sequence 



25 Description of Artificial Sequence: syntheUc DNA 

- F Cint 771 Sail 

<4 00^ 99 

acaatgtatg ccctcaat.gr. egacgagegt gqtgaaaaca caaaa 



•;210> 100 
30 <211> 66 



WO 00/42199 



- 104 - 
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[u-cription of Artificial Sequence: synthetic DNA 
- RM1BK 760 His Xhol 



qcaatgtatg cc 



ctcaatct cgagttaatg gtgatggtga tggtgcagtt 
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